KVR Audio

pscorbett · Post by **pscorbett** » Wed Jan 31, 2024 5:49 am

Not sure which book you're referring to (Vadim's?)

Yes, exactly

These variables are the "implementation details" for the "abstract" integrators.

Actually, that's exactly how I was thinking about them. Glad to hear I was on the right track. Doesn't a z1, z2 naming convention get confusing too? To me, that's very close to z-domain notation representing sample delays.

It depends on the surrounding context. Often this kind of operation (specifically with the "2" term) is found in code for trapezoidal when using the common "trick" of implementing the integrator in "TDF2" form

Ahh, yes, my mistake. Trapezoidal, not Eulers! I didn't know the context of this TDF2 trick... thanks!

Z1202 · Post by **Z1202** » Wed Jan 31, 2024 10:48 am

pscorbett wrote: ↑Wed Jan 31, 2024 5:49 am Doesn't a z1, z2 naming convention get confusing too? To me, that's very close to z-domain notation representing sample delays.

Considering that (I believe) it's relatively common to notate the internal state of the z^-1 elements in code as z1, z2 etc., I think it's consistent to notate the internal state of the s^-1 elements as s1, s2 etc.

Tone2 Synthesizers · Post by **Tone2 Synthesizers** » Wed Jan 31, 2024 12:40 pm

I got a qustion:
Can the Trapezoidal method be seen as a 1st order linear prediction of the feedback signal?

pscorbett · Post by **pscorbett** » Wed Jan 31, 2024 2:49 pm

Umm... I don't think so? Here, at least, it is just integrating the capacitor voltages, which are in the forward path. Once the next step of these voltages are obtained, I was able to solve the feedback path using Newton's method (Newton Raphson).

I don't know all of the applications of trapezoidal method, so I'm not going to assert that it could never be used to solve the feedback path. Many filters have a capacitor in the feedback of an opamp for instance, maybe it would be useful for something like that.

mystran · Post by **mystran** » Wed Jan 31, 2024 9:54 pm

Tone2 Synthesizers wrote: ↑Wed Jan 31, 2024 12:40 pm I got a qustion:
Can the Trapezoidal method be seen as a 1st order linear prediction of the feedback signal?

Well, trapezoidal method essentially takes the average of previous and current time-steps, but the "current timestep" is in a sense "predicted" by solving the feedback from the system of equations. The fact that we need to solve the system (ie. "predict" the feedback) is what makes the method "implicit" as oppose to "explicit" methods where we can just trace the signal flow because every loop always has a delay, where as trapezoidal (as it uses the current time-step value as part of the integrated value) inherently has a "delay free" loop (that's where the "ZDF" name came from).

mystran · Post by **mystran** » Wed Jan 31, 2024 9:57 pm

pscorbett wrote: ↑Wed Jan 31, 2024 2:49 pm Umm... I don't think so? Here, at least, it is just integrating the capacitor voltages, which are in the forward path. Once the next step of these voltages are obtained, I was able to solve the feedback path using Newton's method (Newton Raphson).

The ladder has both local feedback around each integrator (that's how you turn one into a low-pass; an ideal capacitor is a pure integrator) and global feedback around the whole thing which is controlled by the resonance "knob" as increasing the global (negative) feedback is how a ladder gets it's resonance peak.

Z1202 · Post by **Z1202** » Wed Jan 31, 2024 10:19 pm

Trapezoidal method per se doesn't involve any prediction. However the fact that the output value of a trapezoidal integrator depends on the current input results in feedback loops around such integrators being "zero-delay". This feature is however not unique to trapezoidal integration. E.g. backward Euler suffers from exactly the same problem, whereas forward Euler does not. So theoretically one can build backward Euler-based ZDF filters (although why would one want to), while forward Euler filters are in a sense "automatically ZDF", since they don't need any special resolution of their feedback (by an artificially inserted z^-1 delay, or by solving the zero-delay feedback equation).

The common terminology speaks about "explicit" methods (not causing zero-delay problems) and implicit methods (causing those problems).

pscorbett · Post by **pscorbett** » Mon Feb 05, 2024 5:27 am

Well, I have to be honest, I'm a little stumped now in how to implement this model for real-time DSP. So far, I've tried building it as a Max MSP external using the "Min Devkit" (which is a C++ "wrapper" for the regular C-based Max SDK from what I understand).

I'm using the "Eigen" linear algebra library, and currently using the Full LU decomposition as the Ax=b solver. I tried partial LU and "colPivHouseholderQr" as well, but found the performed worse in my test program. My very crude C++ timer stated that one iteration of the NR loop took roughly 150 microseconds, and I found in my python prototype that I usually needed about 4 iterations to meet my stop conditions.

Right now, my "external" compiles, and completely dimes my CPU meter without producing an output. I'll post the full code here. If anyone has any ideas to optimize further, I'd really appreciate it!!

Code: Select all

#include "c74_min.h"
#include "../shared/signal_routing_objects.h"
#include <math.h>
#include <Eigen/Dense>

using namespace Eigen;

class ladder : public object<ladder>, public sample_operator<4, 1>
{
private:
    // Matrix and vector declarations
    Matrix <float, 5, 1> y;
    Matrix <float, 5, 1> y_est;
    Matrix <float, 5, 1> delta_y;
    Matrix <float, 5, 1> F;
    Matrix <float, 5, 5> J;
    
    // max NR error (stopping condition)
    float errorThresh = 0.000001;
        
    // accumulator states (capacitors)
    float s1 = 0.f;
    float s2 = 0.f;
    float s3 = 0.f;
    float s4 = 0.f;
    float s5 = 0.f;
    
    
public:
    // metadata
	MIN_DESCRIPTION {"Transistor ladder filter with feedback."};
	MIN_TAGS {"filter, ladder, lpf"};
	MIN_AUTHOR {"Peter Corbett (Osprey Instruments)"};
	MIN_RELATED {"lores~, svf~, onepole~"};

    // IO
	inlet<>  x {this, "(signal) Signal Input"};
	inlet<>  cutoff {this, "(signal) Cutoff Frequency"};
	inlet<>  r {this, "(signal) Resonance"};
    	inlet<>  fbk {this, "(number) Feedback"};
	outlet<> output {this, "(signal) Output", "signal"};
   	attribute<float> b { this, "bias", 0.65, range {-1.0, 1.0} };
    
    // more efficient tanh?
    float tanh2(float x)
    {
        float x2 = x*x;
        float sh = x*(1+x2*((1/6.f) + x2*(1/120.f)));
        return sh / sqrt(1+sh*sh);

    }

    // highpass filter
    float Fc_hp = 8.f;
    float g_hp = Fc_hp * M_PI / samplerate();
    
    // Constructor
    ladder()
    {
        y.setZero();
        y_est.setZero();
        delta_y.setZero();
        F.setZero();
        J.setZero();
        J(4,4) = -1.f;
    }
        
        
	// Call operator: process a single sample
	sample operator()(sample x, sample cutoff, sample r, sample fbk)
    {
        
        float g = cutoff * M_PI / samplerate();
        
        int i = 0;
        int residue = 100;
        while(abs(residue) > errorThresh && i < 50)
        {
            // pre-compute tanh functions
            float tanh_x = tanh2(x - r*y(3) + y(4));
            float tanh_y1 = tanh2(y(0));
            float tanh_y2 = tanh2(y(1));
            float tanh_y3 = tanh2(y(2));
            float tanh_y4 = tanh2(y(3));
            float tanh_y5 = tanh2(fbk*(y(3) - b));
            
            // F(y) using current y values
            F(0) = g*(tanh_x-tanh_y1) + s1 - y(0);
            F(1) = g*(tanh_y1-tanh_y2) + s2 - y(1);
            F(2) = g*(tanh_y2-tanh_y3) + s3 - y(2);
            F(3) = g*(tanh_y3-tanh_y4) + s4 - y(3);
            F(4) = tanh_y5 - (s4 / (1.f - g_hp)) - y(4);
            
            // pre-compute re-used algebra (helper "functions")
            float help_x = 1.f - (tanh_x * tanh_x);
            float help_y1 = 1.f - (tanh_y1 * tanh_y1);
            float help_y2 = 1.f - (tanh_y2 * tanh_y2);
            float help_y3 = 1.f - (tanh_y3 * tanh_y3);
            float help_y4 = 1.f - (tanh_y4 * tanh_y4);
            float help_y5 = 1.f - (tanh_y5 * tanh_y5);
            float minus_g = -1.f * g;
            
            // Jacobian Matrix
            // only change these matrix elements (the rest are zeros)
            J(0,0) = minus_g*help_y1 - 1.f;
            J(0,3) = minus_g*r*help_x;
            J(0,4) = g*help_x;
            J(1,0) = g*help_y1;
            J(1,1) = minus_g*help_y2 - 1.f;
            J(2,1) = g*help_y2;
            J(2,2) = minus_g*help_y3 - 1.f;
            J(3,2) = g*help_y3;
            J(3,3) = minus_g*help_y4 - 1.f;
            J(4,3) = fbk*help_y5;
            // J(4,4) = -1.f;
            // Above moved to constructor function
            
            // calculate next NR step
            y_est = y;
            delta_y = J.fullPivLu().solve(-1.f*F);
            y = delta_y + y;
            residue = y(3) = y_est(3);
            i++;
        }
        
        // update capacitor states
        s1 = 2.f * y(0) - s1;
        s2 = 2.f * y(1) - s2;
        s3 = 2.f * y(2) - s3;
        s4 = 2.f * y(3) - s4;
        s5 = 2.f * y(4) - s5;
        
        // save y
        return y(3);
        
	}
};

MIN_EXTERNAL(ladder);

mystran · Post by **mystran** » Mon Feb 05, 2024 3:44 pm

pscorbett wrote: ↑Mon Feb 05, 2024 5:27 am I'm using the "Eigen" linear algebra library, and currently using the Full LU decomposition as the Ax=b solver. I tried partial LU and "colPivHouseholderQr" as well, but found the performed worse in my test program. My very crude C++ timer stated that one iteration of the NR loop took roughly 150 microseconds, and I found in my python prototype that I usually needed about 4 iterations to meet my stop conditions.

For best performance, you should generate a specialized solver. LU is probably the algorithm you want to go for, but a generic LU solver will waste a lot of time loading/storing values into a generic matrix, perhaps looking for pivots and even just the loops. All of this overhead can be avoided by writing code that only solves your particular system and you can then also have the compiler optimize individual arithmetic operations and register allocation (keeping as many of the elemens as possible in registers as long as possible) is huge for small systems. That is, most likely what you are currently spending most of your CPU cycles is pure overhead compared to the actually useful work for solving a small system.

Ideally you declare every non-zero element in the matrix as a separate variable, unroll the whole LU code (obviously skipping useless operations due to sparsity), with fixed pivots and then let compiler optimize it as straight-line code. To improve further, you can precompute everything that does not depend on the runtime values (or perhaps only depends on the sampling rate) and also separate the inner system where you need to perform NR, such that you can then factor the main system once and only refactor (or rank-n update; cost is about the same) the inner part for each iteration.

If that sounds like it's tedious and error-prone to do manually, then I would agree. What you ideally want is a small code-generator (can be python or whatever) that reads some description of the system, performs the busywork like unrolling LU and declaring a gazillion variables and dumps out C++ code that you then give to the compiler.

pscorbett · Post by **pscorbett** » Mon Feb 05, 2024 4:20 pm

Wow, I was afraid it might be this haha!! It looks like I have my work cut out for me!
This sounds like this will be worth the effort though. Do you have any tips on where our how to learn about constructing a code generator? I've heard this idea floated a few times before. I think Andy mentioned it in his ADC talk as well. Theoretically, I could write the generator in python and output a C++ solver function specific to my system, right?

Z1202 · Post by **Z1202** » Mon Feb 05, 2024 4:54 pm

Another potential problem here is that NR doesn't like saturating functions like tanh too much (especially in a zero-delay feedback setting, where the 2nd derivative can shoot into the outer space at high cutoffs/resonances), and here we're having 6 of those.

pscorbett · Post by **pscorbett** » Tue Feb 06, 2024 1:17 am

Good point! You mean the horizontal asymptotes? The NR didn't seem to be struggling in my python prototype. Kept the max value of iterations and had it not exceeding 4 in all test values (different conditions from different inputs and parameters), which seemed pretty reasonable to me. Is 4 alot though? It certainly is not an insignificant amount of processing. I could maybe relax the stopping condition a little bit

Edit: just reread that and noticed you had said second derivative. The Jacobian matrix only uses 1st partial derivatives so I don't think that comes into play here. The first and second derivatives seem to have a horizontal asymptotes at y=0 as x approaches positive and negative infinity, and of course tanh has the asymptotes at y=+/-1. I guess the derivative has difficulty projecting towards a horizontal asymptotes, but I think it should yield almost the right value as the asymptotes are approached quite quickly

pscorbett · Post by **pscorbett** » Tue Feb 06, 2024 7:07 am

mystran wrote: ↑Mon Feb 05, 2024 3:44 pm Ideally you declare every non-zero element in the matrix as a separate variable, unroll the whole LU code (obviously skipping useless operations due to sparsity), with fixed pivots and then let compiler optimize it as straight-line code. To improve further, you can precompute everything that does not depend on the runtime values (or perhaps only depends on the sampling rate) and also separate the inner system where you need to perform NR, such that you can then factor the main system once and only refactor (or rank-n update; cost is about the same) the inner part for each iteration.

If that sounds like it's tedious and error-prone to do manually, then I would agree. What you ideally want is a small code-generator (can be python or whatever) that reads some description of the system, performs the busywork like unrolling LU and declaring a gazillion variables and dumps out C++ code that you then give to the compiler.

I got to thinking this through. Admittedly, I don't know how to write a python script that writes C++ code, although that might be fun and useful to learn. As maybe a stopgap, I was thinking this might be a good point to bust out GNU Octave and let it do the symbolic calculations for me. Then I could format them and dump them into my C++ code as a custom function specific to solving this matrix/system.

Also, and this may be very misguided, so bare with me... but originally, my system was to be solved with the inverse Jacobian multiplied by the F vector, and I'd moved it away from that to avoid the overhead of matrix inversions. However, since my system is of static dimensions, with the sparsity/zeros you had mentioned, I might be able to get away with solving the 5x5 matrix inversion symbolically, and maybe even the subsequent matrix subtraction. Then I could either try to substitute the resulting algebra from the matrix/vector indices to the actual equations... or maybe just leave them referencing the variables for each element. I'm not sure if this might be less computationally expensive than actually trying to solve the matrix using LU decomposition. I could probably try both symbolically and see what looks lighter on math resulting math.

Z1202 · Post by **Z1202** » Tue Feb 06, 2024 7:56 am

Even though NR uses 1st derivatives, its convergence is determined by the 2nd derivative. In my own experiments with tanh nonlinearities in the zero-delay feedback context there was a sharp knee in the resulting function at high cutoff/resonance values. Of course you can simply limit the number of iterations, but then, do you have an estimation of what your resulting precision is? Again, this precision will most likely depend on the parameters and possibly even on the state and worst case it'll actually diverge rather than converge.

pscorbett · Post by **pscorbett** » Tue Feb 06, 2024 3:52 pm

Thanks for the clarification. I just graphed this in Desmos and you are absolutely correct. I don't have a well flushed out estimation of my precision. My stopping criteria is basically just a residue of the output voltage that is a very small float sufficiently close to zero. I wasn't even considering the intermediary nodes. I will try pushing my prototype a little harder when home from work and see if there are any conditions that "break" it and force it to take more than 4 iterations.

Out of curiosity, do you think it is worthwhile to try and find an estimate better than the previous outputs? Is it worth the effort to try and do so?

Moog Ladder Filter - Research Paper