A matrix approach to the Moog ladder filter
-
- KVRist
- Topic Starter
- 46 posts since 20 Jun, 2013 from Berkeley, CA
Yep, you got it right. You build the matrix every time you update the parameters, but for every audio sample you're doing much less work: in the linear case, multiplying a vector by a scalar and multiplying a vector by a matrix (and adding these two results to form the new state vector). That's 20 multiplies.
There are a couple optimizations on top of what you said. First, you can square a Toeplitz matrix in 7 dot products, not 16, because of the symmetries. Lastly, I'm not just doing the repeated squaring any more, I'm starting with a few terms of the Taylor's series for exponentiation and then refining that with repeated squaring. This lets me exceed the accuracy of the original method, with only 7 matrix multiplications.
And all of these operations run pretty fast on modern SIMD hardware, without a need to do a lot of repacking, because they're highly optimized for exactly this kind of math. I seriously doubt x86 is going to need 50 cycles to do a 4x4 matrix multiplication. Even an ARM should be able to do it faster than that.
Hope this illuminates.
There are a couple optimizations on top of what you said. First, you can square a Toeplitz matrix in 7 dot products, not 16, because of the symmetries. Lastly, I'm not just doing the repeated squaring any more, I'm starting with a few terms of the Taylor's series for exponentiation and then refining that with repeated squaring. This lets me exceed the accuracy of the original method, with only 7 matrix multiplications.
And all of these operations run pretty fast on modern SIMD hardware, without a need to do a lot of repacking, because they're highly optimized for exactly this kind of math. I seriously doubt x86 is going to need 50 cycles to do a 4x4 matrix multiplication. Even an ARM should be able to do it faster than that.
Hope this illuminates.
-
- KVRist
- 54 posts since 20 Oct, 2010
The appnote has been revised and updated with a more accurate implementation of the TPT approach.raphx wrote:Ok, I withdraw my bold claim. I was surprised how bad the performance of Will Pirke's implementation was - it didn't actually seem any better than the Huovilainen one. As you say, it's now obvious that it's a wrong implementation.
Now I'll have to more deeply understand the TPT approach so that I can implement it correctly. This will no doubt take a few days.
I'm still fairly optimistic about total bang for the buck, but now it's clear there's more of a tradeoff.
-
- KVRian
- 563 posts since 23 Nov, 2010
Thanks for the explanation.
The DPPS (dot product packed singles) op on X86 takes has a reciprocal throughput of 2 cycles and a latency of 12, so for 16 it'll take 32 cycles + 12 best case. If you don't have SSE4 it takes 3 separate ops to do each dot product, so 48 + approx 12. Then you have a bunch of shuffling to get the scalar results back into matrix form. I figure about 2 cycles per register.raphx wrote: And all of these operations run pretty fast on modern SIMD hardware, without a need to do a lot of repacking, because they're highly optimized for exactly this kind of math. I seriously doubt x86 is going to need 50 cycles to do a 4x4 matrix multiplication.
Chris Jones
www.sonigen.com
www.sonigen.com
- KVRAF
- 8493 posts since 12 Feb, 2006 from Helsinki, Finland
He's not working on x86 though.. but how about packing the matrix in column-major?sonigen wrote:Thanks for the explanation.
The DPPS (dot product packed singles) op on X86 takes has a reciprocal throughput of 2 cycles and a latency of 12, so for 16 it'll take 32 cycles + 12 best case. If you don't have SSE4 it takes 3 separate ops to do each dot product, so 48 + approx 12. Then you have a bunch of shuffling to get the scalar results back into matrix form. I figure about 2 cycles per register.raphx wrote: And all of these operations run pretty fast on modern SIMD hardware, without a need to do a lot of repacking, because they're highly optimized for exactly this kind of math. I seriously doubt x86 is going to need 50 cycles to do a 4x4 matrix multiplication.
edit: I'll expand a bit. Basically, you don't need dot-products if you store the columns of the (leftmost) matrix and then calculate the matrix multiply as a "sum of outer products" which results in the same amount of total multiplies and adds, but you get to do the adds with regular addps and move the shuffling to the input (rather than output). I haven't tried, but I suspect it almost might pipeline a tiny bit better.
-
- KVRist
- Topic Starter
- 46 posts since 20 Jun, 2013 from Berkeley, CA
Syrou: Yes, he's been in touch with me as well. I took a look at the revised app note, and it looks good to me now. Certainly the frequency response plots are as expected.
mystran and sonigen: Yes, the matrix is in column major order, so the core of matrix multiply is just vmla.f32, and the results end up exactly where they need to go, no repacking necessary. There's a pretty good blog on how to do it in the general 4x4 case: http://blogs.arm.com/software-enablemen ... plication/
But I had to do it better, of course. I wanted to exploit the Toeplitz symmetry of this particular problem and reduce the number of operations required. Here's the core of matrix squaring (input is q0 and q2, output is q12 and q13):
Here the vext instructions are striping the two fragments of the matrix into a full 4x4 matrix (q0, q8, q9, q10 are the columns). According to Pulsar, the whole thing takes 32 cycles on Cortex-A8.
http://pulsar.webshaker.net/ccc/sample-75cbfede
I was pleased by the way the whole thing fit together. The whole matrix generation code, from frequency and resonance to A matrix and B vector, is 100 lines of assembler, not bad for how obsessively optimized this is.
mystran and sonigen: Yes, the matrix is in column major order, so the core of matrix multiply is just vmla.f32, and the results end up exactly where they need to go, no repacking necessary. There's a pretty good blog on how to do it in the general 4x4 case: http://blogs.arm.com/software-enablemen ... plication/
But I had to do it better, of course. I wanted to exploit the Toeplitz symmetry of this particular problem and reduce the number of operations required. Here's the core of matrix squaring (input is q0 and q2, output is q12 and q13):
Code: Select all
neon_ladder_mkmatrix2:
vext.32 q8, q2, q0, #3
@ q0 = {a11, a21, a31, a41}, q2 = {0, a14, a24, a34}
@ square the matrix
vmul.f32 q12, q0, d0[0]
vmul.f32 q13, q8, d4[1]
vext.32 q9, q2, q0, #2
vmla.f32 q12, q8, d0[1]
vmla.f32 q13, q9, d5[0]
vext.32 q10, q2, q0, #1
vmla.f32 q12, q9, d1[0]
vmla.f32 q13, q10, d5[1]
vmla.f32 q12, q10, d1[1]
vmla.f32 q13, q2, d0[0]
http://pulsar.webshaker.net/ccc/sample-75cbfede
I was pleased by the way the whole thing fit together. The whole matrix generation code, from frequency and resonance to A matrix and B vector, is 100 lines of assembler, not bad for how obsessively optimized this is.
- KVRAF
- 8493 posts since 12 Feb, 2006 from Helsinki, Finland
Yeah well, sadly SSE [at least the versions you can realistically expect] in comparison lacks a scalar-vector multiply, so you have to shuffle any scalars into all the components for regular packed multiply.. but I suspect it'd still be more efficient to do it with outer-products, especially since you don't need any bleeding edge extensions (apparently the fused multiply-add is coming, but in practice it's probably another 10 years before you can expect to rely on people actually having it).
-
- KVRian
- 563 posts since 23 Nov, 2010
After seeing that ARM code it makes you wonder how Intel keeps getting things so wrong with SSE. It's like they design a bicycle and then decide to only give you one peddle.
I'm trying very hard not to rant here.
I'm trying very hard not to rant here.
Chris Jones
www.sonigen.com
www.sonigen.com
-
Smashed Transistors Smashed Transistors https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=339459
- KVRist
- 142 posts since 10 Oct, 2014
Hello,
thanks to all for this thread, i've read it some 10 days ago while digging up some old C code to use them within Reaper (jsfx) . I've digged up an implementation of the state variable filter that used the same kind of method http://forum.cockos.com/showthread.php?t=166480.
Step invariant transform seem to me the best method for linear low pass filters, but it is a pain in the feedback to get it working.
I've got a simplified iterative method that gives the coefficients based on the "virtual oversampling"
.
I've also tried to use the taylor approximation of the exp(M)
, it converges faster but it provided me some unstable results.
I found an analytical form a few days ago for the SVF... i can imagine the hard work for finding it for the 24dB-filter-made-of-four-lossy-integrators-with-negative-feedback !!!
But as the numerical method converges very fast, i will only use it as a test/reference
Again, thanks for this thread.
Thierry
thanks to all for this thread, i've read it some 10 days ago while digging up some old C code to use them within Reaper (jsfx) . I've digged up an implementation of the state variable filter that used the same kind of method http://forum.cockos.com/showthread.php?t=166480.
Step invariant transform seem to me the best method for linear low pass filters, but it is a pain in the feedback to get it working.
I've got a simplified iterative method that gives the coefficients based on the "virtual oversampling"
.I've also tried to use the taylor approximation of the exp(M)
, it converges faster but it provided me some unstable results.I found an analytical form a few days ago for the SVF... i can imagine the hard work for finding it for the 24dB-filter-made-of-four-lossy-integrators-with-negative-feedback !!!
Again, thanks for this thread.
Thierry
Last edited by Smashed Transistors on Mon Oct 05, 2015 6:52 pm, edited 1 time in total.
- KVRAF
- 12615 posts since 7 Dec, 2004
Please stop using the term "state-variable" to refer to something concrete. This is an abstract term which applies to all of these filter implementations.
Likewise the term "moog filter" is ridiculous, although it can be assumed by those aware of the jargon, there were numerous different filters used in equipment manufactured by moog (corp) and others in association with it.
I believe when you use the term "state-variable" you intend to describe a "kerwin-huelsman-newcomb filter" which is a particular implementation of a state-variable filter.
When you use the term "moog filter" you intend to describe four inverting active lossy integrators with negative global feedback. I would suggest the term "generic four-pole OTA" or "four cascaded inverting lossy-integrator".
If you mean "mini-moog filter" (which I assume you do) you must recognize that it is foolish to use the term in a generic way when you are not intending to implement the specific properties resulting from the very specific layout of this particular filter.
I find use of the term "moog filter" as a generic to be especially offensive due to the fact I'm left with little to no idea what you're actually referring to when you use the term. It would be equally productive to use the term "warm filter".
Only if you truly have such a bold and narrow focus on one particular electronic circuit can it be justified to use the term "mini-moog filter". This would make it clear what you are referring to. Otherwise I'm sorry but you'll simply need to use more words to clearly describe the topic.
Most often I am left with the impression the term "four-pole/24db filter" is the true intent.
Likewise the term "moog filter" is ridiculous, although it can be assumed by those aware of the jargon, there were numerous different filters used in equipment manufactured by moog (corp) and others in association with it.
I believe when you use the term "state-variable" you intend to describe a "kerwin-huelsman-newcomb filter" which is a particular implementation of a state-variable filter.
When you use the term "moog filter" you intend to describe four inverting active lossy integrators with negative global feedback. I would suggest the term "generic four-pole OTA" or "four cascaded inverting lossy-integrator".
If you mean "mini-moog filter" (which I assume you do) you must recognize that it is foolish to use the term in a generic way when you are not intending to implement the specific properties resulting from the very specific layout of this particular filter.
I find use of the term "moog filter" as a generic to be especially offensive due to the fact I'm left with little to no idea what you're actually referring to when you use the term. It would be equally productive to use the term "warm filter".
Only if you truly have such a bold and narrow focus on one particular electronic circuit can it be justified to use the term "mini-moog filter". This would make it clear what you are referring to. Otherwise I'm sorry but you'll simply need to use more words to clearly describe the topic.
Most often I am left with the impression the term "four-pole/24db filter" is the true intent.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.
-
Smashed Transistors Smashed Transistors https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=339459
- KVRist
- 142 posts since 10 Oct, 2014
- KVRAF
- 12615 posts since 7 Dec, 2004
Explain which topology and which properties you want to refer to.
I suspect "four-pole filter" is what you want.
https://en.wikipedia.org/wiki/Electroni ... r_topology
BTW, I know it often looks like I'm replying to a particular post. In general if I did not directly quote a post, my reply is to the thread and not to a specific post.
I suspect "four-pole filter" is what you want.
https://en.wikipedia.org/wiki/Electroni ... r_topology
BTW, I know it often looks like I'm replying to a particular post. In general if I did not directly quote a post, my reply is to the thread and not to a specific post.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.
-
Smashed Transistors Smashed Transistors https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=339459
- KVRist
- 142 posts since 10 Oct, 2014
Sure...
i wanted to thank people here, not to trigger this kind of allergic reaction Mr aciddose.
i wanted to thank people here, not to trigger this kind of allergic reaction Mr aciddose.
Last edited by Smashed Transistors on Sat Oct 03, 2015 11:18 pm, edited 2 times in total.
- KVRAF
- 12615 posts since 7 Dec, 2004
It's important to point out when people are using irrational/unnecessary jargon. I've been doing it for over a decade now and I've made what appears to be some small difference during this time, although perhaps my efforts simply coincidentally correspond to more knowledgeable individuals appearing in discussions of these topics.
https://en.wikipedia.org/wiki/Topology_ ... tronics%29
Some of these Wikipedia articles are relatively recent and I think it would be quite helpful for people to read up on them.
In many cases terms such as "ladder filter" or "moog filter" are used as generics when I suspect people don't realize the ambiguity in these terms. For example a series of repeated kerwin-huelsman-newcomb filters (or integrators comprising sections thereof) are classified as a "ladder" in terms of topology.
"Moog filter" is ambiguous as I described due to the fact that there were many different circuits employed in both the various separate synthesizers as well as synthesizer modules produced by r.a. moog co / moog music over the years, some of which are still produced today.
In fact you could refer to a "moog ladder filter" thinking you're referring to a generic four-pole low-pass filter with negative feedback, while in fact there was a moog music module implementing a high-pass ladder without any feedback!
Being technical and specific is important to avoid confusion. In these cases if the desire were to refer specifically to the minimoog filter configured as a low-pass, I'd go with "minimoog low-pass filter" which should avoid any possibility for confusion.
In all other cases, when referring to a generic four-pole low-pass filter, simply use the term "generic four-pole low-pass filter" !
https://en.wikipedia.org/wiki/Topology_ ... tronics%29
Some of these Wikipedia articles are relatively recent and I think it would be quite helpful for people to read up on them.
In many cases terms such as "ladder filter" or "moog filter" are used as generics when I suspect people don't realize the ambiguity in these terms. For example a series of repeated kerwin-huelsman-newcomb filters (or integrators comprising sections thereof) are classified as a "ladder" in terms of topology.
"Moog filter" is ambiguous as I described due to the fact that there were many different circuits employed in both the various separate synthesizers as well as synthesizer modules produced by r.a. moog co / moog music over the years, some of which are still produced today.
In fact you could refer to a "moog ladder filter" thinking you're referring to a generic four-pole low-pass filter with negative feedback, while in fact there was a moog music module implementing a high-pass ladder without any feedback!
Being technical and specific is important to avoid confusion. In these cases if the desire were to refer specifically to the minimoog filter configured as a low-pass, I'd go with "minimoog low-pass filter" which should avoid any possibility for confusion.
In all other cases, when referring to a generic four-pole low-pass filter, simply use the term "generic four-pole low-pass filter" !
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.
-
Smashed Transistors Smashed Transistors https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=339459
- KVRist
- 142 posts since 10 Oct, 2014
Sorry, i do not agree with you Mr aciddose, the term has been used for the generic kind of 24dB low pass filter for decades, long before kvr, vstis, musicdsp or even the analog heaven mailing list .
Moog's filter has been copied in many ways getting around his patents. May it be specifically a ladder or a series of CA3080, Norton amplifiers or diodes or any digital incarnation ... to me they have the same origin.
Moog's filter has been copied in many ways getting around his patents. May it be specifically a ladder or a series of CA3080, Norton amplifiers or diodes or any digital incarnation ... to me they have the same origin.
Last edited by Smashed Transistors on Mon Oct 05, 2015 6:53 pm, edited 1 time in total.
-
Smashed Transistors Smashed Transistors https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=339459
- KVRist
- 142 posts since 10 Oct, 2014
There is also a lot of things that i do not agree with here and there... but i do not go in "i'm fed up, everybody is ridiculous" mode. Well... so far.
I may admit that i find the generic term "Zero Delay Feedback" very confusing.
What do we call "feedback" ? the complete loop or just the feedback connection (which is generally explicit) ?
What kind of "delay" are we talking about ? group delay, phase delay ? "Z-1" delays caused by explicit vs implicit vs bilinear integrators ?
What does "Zero" mean here ? Analog circuits induce frequency dependant delays http://www.radiolab.com.au/DesignFile/DN004.pdf. So why would we have "zero delay" in a digital implementation ?
As far as i understood, it seems that the so called ZDF consists in solving the explicit feedback connection in order to optimise resonance. As far as i tested, it does not work very well. I had to add some HP filter in the feedback connection much like any other implementation to get correct resonance on HF (i will continue to experiment, maybe i missed something).
The matrix approach goes a step further: it solves the whole feedback loop.
So let's exponentiate that Kerwin-Huelsman-Newcomb-Biquad-Filter and that 24dB-filter-made-of-four-lossy-integrators-with-negative-feedback filter.
I may admit that i find the generic term "Zero Delay Feedback" very confusing.
What do we call "feedback" ? the complete loop or just the feedback connection (which is generally explicit) ?
What kind of "delay" are we talking about ? group delay, phase delay ? "Z-1" delays caused by explicit vs implicit vs bilinear integrators ?
What does "Zero" mean here ? Analog circuits induce frequency dependant delays http://www.radiolab.com.au/DesignFile/DN004.pdf. So why would we have "zero delay" in a digital implementation ?
As far as i understood, it seems that the so called ZDF consists in solving the explicit feedback connection in order to optimise resonance. As far as i tested, it does not work very well. I had to add some HP filter in the feedback connection much like any other implementation to get correct resonance on HF (i will continue to experiment, maybe i missed something).
The matrix approach goes a step further: it solves the whole feedback loop.
So let's exponentiate that Kerwin-Huelsman-Newcomb-Biquad-Filter and that 24dB-filter-made-of-four-lossy-integrators-with-negative-feedback filter.
Last edited by Smashed Transistors on Mon Oct 05, 2015 6:54 pm, edited 1 time in total.
