KVR Audio

mistertoast · Post by **mistertoast** » Wed Sep 16, 2009 2:13 pm

Thanks. On my todo list is to write some tools for myself to do this. I like what you've done. Very inspiring.

mistertoast · Post by **mistertoast** » Wed Sep 16, 2009 2:15 pm

I wonder if I messed my function up due to a degrees/radians problem or a domain or range error. Mine was right on top of tanh from -5 to 5, so I'm thinking I can do better if I look more closely at the outputs of the various approximations versus mine.

Might the area to the left of -5 or the right of 5 be significant. For all I know I drift away badly beyond there.

nollock · Post by **nollock** » Wed Sep 16, 2009 7:45 pm

Christian Budde wrote: I am sure that I have done it right so far, as the shape itself for the resulting function is quite close to a tanh(). There might be some effects due to coefficient truncation, especially for the x^3 coefficient, but I doubt that it will change much.

Nope thats right, i just checked myself.

If you plot it with a 40 Hz sine wave you see the difference more clearly, yours has a bunch of sidelobes that fall off slowly. In mine the side lobes fall of really fast, but the first lobe scuppers it by being 18dbs higher than yours.

So yours (FastPower2ContinousError3) is definitely better.

Yours..

Mine

I did do the "extra end point fitting" when I was investigating storing wavetables as polynomial segments, and in that case it made a huge difference, but I dunno, maybe it just doesnt work for this. Or maybe i'd need to spend more time tweaking the cost function.

Maybe the optimal case would be to actual do an FFT for each polynomial and check the side lobe levels, and have that as the cost function. But it'd probably take a couple of days to run. LOL.

mistertoast · Post by **mistertoast** » Wed Sep 16, 2009 8:01 pm

Are the rest of you doing tanh(2x)? I did tanh(x)! Is that why my harmonic falloff is different?

nollock · Post by **nollock** » Wed Sep 16, 2009 8:18 pm

mistertoast wrote:Are the rest of you doing tanh(2x)? I did tanh(x)! Is that why my harmonic falloff is different?

The images i did are for tanh(4x), as that showed the side lobes a little better. But the ones by Christian are (2x) i think.

Christian Budde · Post by **Christian Budde** » Wed Sep 16, 2009 8:35 pm

nollock wrote:
mistertoast wrote:Are the rest of you doing tanh(2x)? I did tanh(x)! Is that why my harmonic falloff is different?
The images i did are for tanh(4x), as that showed the side lobes a little better. But the ones by Christian are (2x) i think.

Yes, mine are tanh(2x). But I did the same 'treatment' with your version as well. So the plots of your version is infact a tanh(2x). However, as you matched the function within a range of +/- 5 it should still be valid for this range.

Christian

mistertoast · Post by **mistertoast** » Wed Sep 16, 2009 8:45 pm

But I think my going after x rather than 2x may have handicapped me.

Can you try this one out? For this one, I did 2x. I want to see if that helps my harmonic amplitude at all.

Code: Select all

Y=X/(A+B*X*X)+X/(C+D*X*X)                                                                                                                                         
  A =  0.38566590621E+01
  B =  0.11306387017E+00
  C =  0.59265249338E+00
  D =  0.79772974129E+00

And I notice my function creeps back toward zero to the left of -5 and to the right of 5. So maybe it helps to do this...

Code: Select all

if (x>5) {
  y=1;
} elseif (x<-5) {
  y=-1;
} else {
  y=x/(A+B*x*x)+x/(C+D*x*x); 
}

Thanks so much for your time, and for letting me try tanh(2x) like the other cool kids.

(And if that works well, I can get it down algebraically to one divide or use a reciprocal approximation to speed it up.)

mistertoast · Post by **mistertoast** » Wed Sep 16, 2009 9:05 pm

Christian Budde wrote:However, as you matched the function within a range of +/- 5 it should still be valid for this range.

What is "this range?" Can you tell me exactly what values are going in as input when you run your test? (High and low?)

Christian Budde · Post by **Christian Budde** » Thu Sep 17, 2009 4:00 am

Hi mistertoast,

I don't really need to plot that function as it is nearly identical to the other function you posted. It is maybe slightly closer, but not even 1 dB in the highest harmonic.

Usually the range you are using for matching (+/- 5 in my case) isn't that important, nor the fact that you are trying to fit tanh(2x) rather than tanh(x). It's just a scale factor for the input and even with 2x the input range is still -2..+2 and still within the approximated range.

Christian

PS: Both functions are charming, but not really close to a tanh. On the other hand it is questionable whether this difference is audible at all. Also since analogue hardware hardly produces a true tanh() function (only if everything would be near to perfect).

mistertoast · Post by **mistertoast** » Thu Sep 17, 2009 4:17 am

OK. Thanks.

D1J1T · Post by **D1J1T** » Fri Sep 18, 2009 6:58 am

[edit: Christian and Dozius caught some errors, plus I found a(nother) mistake in the code. It's fixed now.]

Oddly enough, I did exactly this a few weeks ago using a Pade approximation with 4x polyphase oversampling. (The low level noise at the bottom is there to keep the polyphase filters from going into denormal mode)

I made a little VST plugin you can grab here. It just has an on/off switch and drive knob that goes from 1 to 32.

The approximation is: tanh(x) = (105x+10x^3)/(105+45x^2+x^4) for -3.05059<x<3.05059
Outside of that x=+/-0.989883, meaning a maximum error of about 1%.
The approximating function has local maxima/minima at x=+/-3.05059 (the derivative there is zero). So the (overall) function and all its derivatives are continuous.

I wrote this code in SSE assembler for use in Synthmaker. "streamin" and "streamout" data types are four packed single precision floats (128 bits), i.e., the four upped samples. iirc, this takes about 40 cycles per (128 bit) sample on an Athlon X2, not counting the overhead for up/downsampling. ignore this part: Those first 3 lines are (3.4e+38|0.999999|0.1) to get a zero followed by 31 ones.

Code: Select all

//tanh approximation
//tanh approximation
//local max at {0.989883, {x -> 3.05059}}
streamin in;streamout out;
float F10=10;
float F45=45;
float F105=105;
float F3=3.05059;
float FM3=-3.05059;

movaps xmm0,in;
minps xmm0,F3;		//clamp x
maxps xmm0,FM3;		//between +/-3.05059
movaps xmm1,xmm0;	//copy to xmm1
mulps xmm1,xmm1;	//xmm1=x^2
movaps xmm2,xmm1;
mulps xmm2,xmm0;	//xmm2=x^3
movaps xmm3,xmm2;
mulps xmm3,xmm0;	//xmm3=x^4
movaps xmm4,F10;	//xmm4=10
movaps xmm5,F45;	//xmm5=45
movaps xmm6,F105;	//xmm6=105
mulps xmm0,xmm6;	//xmm0=105x
mulps xmm2,xmm4;	//xmm2=10x^3
addps xmm0,xmm2;	//xmm0=numerator
mulps xmm1,xmm5;	//xmm1=45x^2
addps xmm1,xmm3;
addps xmm1,xmm6;	//xmm1=denominator
divps xmm0,xmm1;	//xmm0=num/denom
movaps out,xmm0;	//out=tanh(in)

Yikes! I feel like such a geek now.

Christian Budde · Post by **Christian Budde** » Fri Sep 18, 2009 7:36 am

Hi,

Well, if you compare your version to the once I posted it is far off. The overtone structure doesn't look like a tanh() nor does a real tanh() contains odd harmonics, which indicates asymmetries. I haven't tried your approximation, but I'm sure it should be worse even compared by eye!

Christian

Dozius · Post by **Dozius** » Fri Sep 18, 2009 7:37 am

D1J1T wrote: I wrote this code in SSE assembler for use in Synthmaker. "streamin" and "streamout" data types are four packed single precision floats (128 bits), i.e., the four upped samples. iirc, this takes about 40 cycles per (128 bit) sample on an Athlon X2, not counting the overhead for up/downsampling. Those first 3 lines are (3.4e+38|0.999999|0.1) to get a zero followed by 31 ones.

For and absolute mask it's faster just to use an int equal to 2147483647, then it becomes only one AND operation. Although, you don't even need to do an absolute at all in this situation. Instead of doing all that sign flipping just do a min then a max and your done.

This

Code: Select all

movaps xmm7,F3P4;
orps xmm7,F0P9;
orps xmm7,F0P1;	
movaps xmm0,in;		//xmm0=x
movaps xmm1,xmm0;	//xmm1=x
andps xmm0,xmm7;	//xmm0=|x|
minps xmm0,F3;		//clamp |x| to 3.05059
andnps xmm7,xmm1;	//get the original sign
orps xmm0,xmm7;	//restore it to the clamped values

becomes this

Code: Select all

movaps xmm0,in;
minps xmm0,F3;  //F3 = 3.05059
maxps xmm0,FM3; //FM3 = -3.05059

Your code also doesn't match your approximation. The coefficients are different and your missing some operations. I think maybe you pasted the wrong code?

Ivan_C · Post by **Ivan_C** » Fri Sep 18, 2009 7:38 am

This thread is great, thanks a lot Christian !

I have one question for you, what do you think of table lookup for approximating the tanh ?

Christian Budde · Post by **Christian Budde** » Fri Sep 18, 2009 8:47 am

Wolfen666 wrote:I have one question for you, what do you think of table lookup for approximating the tanh ?

I haven't considered a table lookup for some reasons, first you need a considerable large table for precise results and then you still need some interpolation for values that fall between the stored values.

On modern CPUs this might cost more cycles due to memory latencies and throughput. Also you might have cache misses, if the table is too large. For smaller tables you need some good interpolation which cost some cycles as well. It would be interesting to check this approach as well.

However for personal work I never really considered look-up tables, as they need more space. Especially if the expected performance is likely to be equal.

I once did a comparison for the log10 approximation paper I once wrote (which got lost before I could release it). It was kind of unfair because I compared it to an old DOS program - whose platform had several flaws. The approximation solution was slightly better, but we didn't optimized the table size nor the interpolation algorithm.

Kind regards,

Christian

PS: Bear in mind, that the optimal table size depends on the cache size of the processor!

Tanh approximations