Fast TANH aproximation

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

FastTriggerFish, your explanation is not quite precise. Step function has full spectrum to my knowledge, so I would not operate with it, you can't get a low-pass filter by stacking step functions on top of each other.

What's really important is to measure the magnitude of the discontinuity relative to the signal, because by means of waveshaper function it scales together with the signal.

So, if you see a discontinuity, it means some of the higher-order polynomial members (read above) do not zero out. And the magnitude of the discontinuity tells the approximate magnitude of the higher-order polynomial coefficient.
Image

Post

Aleksey Vaneev wrote: On my i7-3770K computer, using the latest Intel C++ Compiler, here are the times:
math.h tanh 6.91ns
tanh by original poster 3.78ns
vox_fasttanh 1.89ns
What did you use for clip in testing the op's tanh? I used a simple clamp function:

Code: Select all

x <= min ? min : x >= max ? max : x
And my results looked like:
tanh by original poster 2.9ns
vox_fasttanh 6.9ns

This was using vc++ 08 though. Not questioning your results, simply trying to learn. :wink:

Post

Yes, I've used a simple if() clamp as well.

Intel C++ Compiler is really good at optimizing. You also have to see their FFT routines - they are fastest available to date, high performance is consistent for most practical block sizes.
Image

Post

Will do, thanks.

Post

Aleksey Vaneev wrote:FastTriggerFish, your explanation is not quite precise. Step function has full spectrum to my knowledge, so I would not operate with it, you can't get a low-pass filter by stacking step functions on top of each other.

What's really important is to measure the magnitude of the discontinuity relative to the signal, because by means of waveshaper function it scales together with the signal.

So, if you see a discontinuity, it means some of the higher-order polynomial members (read above) do not zero out. And the magnitude of the discontinuity tells the approximate magnitude of the higher-order polynomial coefficient.
You should read what I wrote again.
It may be a language barrier issue but I never talked about stacking step functions.
All I said was : a signal with a discontinuity in the second derivative is the sum of a a smooth signal + a dirac impulse integrated 3 times.
Integrating 3 times means applying a third order low pass filter ( with a pole at DC ), so you are low passing a dirac impulse which means your signal is not bandlimited and you will have aliasing.
Of course the size of the discontinuity will determine how strong these harmonics are.
Now if you apply such a function to a bandlmited signal, your resulting signal will also have a discontinuous 2nd order derivative ( by chain law ) and so your resulting signal is not band limited anymore.

I am quite hangover today but I'm still pretty sure I'm just stating basic signal processing facts here ;-)

Of course once again, in practice the effect may be small enough to not be unpleasant in this case, but it's definitely there.

Post

I must have tossed things up, as sonigen's tanh routine now also shows 1.90ns.
Image

Post

+-2000 or +-RAND_MAX are ridiculous input ranges for tanh() and at least on GCC they a completely skewing the results for the stdlib tanh().

There's a website call compileonline.com that you can use to compile and run C++ code with GCC, ok it's not perfect because you cant be sure what compiler settings are but for this it doenst really matter...

Using the bencmark code linked to by mystran with original input range, IE.. from 0 to RAND_MAX...

tanh(x) 9.0 ns
x/(1+|x|) 7.0 ns

With the input range reduced to +-2000....

tanh(x) 7.0 ns
x/(1+|x|) 7.0 ns

With the input range reduced to +-20....

tanh(x) 29.0 ns
x/(1+|x|) 6.0 ns

With specific inputs...

tanh(21) 28ns
tanh(22) 4ns

As you can see large inputs to tanh() are short-cutting out before actually doing the real tanh() calculation. Why? Because if abs(x) is > 21 the result is so close to +-1 that it rounds to +-1. Although MSVC doesn't do that from what I can tell.

The point is you can add that optimization to your own functions, for example...

if (abs(x) > 1) return sign(x); // could be x/abs(x), or some bitmask

And in the benchmarks everyone here is using 99 times out of a 100 that's all that ever gets exceuted in the call to tanh().
Chris Jones
www.sonigen.com

Post

Also if you're doing plain C++ code, no asm, it'll probably be better to use this as a cliping function as it's branchless...

x = x*0.5f;
x = abs(x+0.5f) - abs(x-0.5f);

In fact I tested a plain C++ with the abs clip against the SSE vs and it was only a little bit slower...

float TESWITHABSCLIP(float x)
{
x = x / 3.4f;
x = x*0.5f;
x = abs(x+0.5f) - abs(x-0.5f);
x = (abs(x)-2)*x;
return (abs(x)-2)*x;
}

that clocked 5.8ns vs 5.2 for the SSE version.
Chris Jones
www.sonigen.com

Post

OK, I've reduced the range to -20..20, that's a practical range for audio. Here is the full benchmark (except the erf() function):
"x" 0.50 ns | -37.305356
"atan(pi*x/2)*2/pi" 9.91 ns | -3.579481
"atan(x)" 8.54 ns | -5.487886
"1/(1+exp(-x))" 4.65 ns | 499999998.134898
"1/sqrt(1+x^2)" 3.83 ns | 299822295.040010
"x/(1+|x|)" 1.89 ns | -3.391294
"tanh(x)" 7.55 ns | -3.730554
"fasttanh_old(x)" 3.81 ns | -3.730052
"fasttanh_sonigen(x)" 1.89 ns | -3.730502
"vox_fasttanh_ultra(x)" 1.89 ns | -3.730640

(third number is a sum of results)
Image

Post

I've added Aleksey's tanh to my bechmarks are the results are...

stdlib TANH 140
1/(abs(x)+1) 7.3
sonigen tanh 5.2
sonigen fastv 4.2
abs clip vesrion 5.8
vox_fasttanh 6.5

notes:

abs clip verson is same as my in OP but with abs clipper, and is plain C++ code.

Aleksey's "vox_fasttanh" is also plain C++.

Others are asm versions.
Last edited by sonigen on Fri Aug 09, 2013 8:00 pm, edited 2 times in total.
Chris Jones
www.sonigen.com

Post

Summing I do after each function evaluation is reliable, because compiler cannot predict results of non-linear functions. What's strange I get 1.89ns for various functions, and I use high-performance counters for time measurements.
Image

Post

Aleksey Vaneev wrote:Summing I do after each function evaluation is reliable, because compiler cannot predict results of non-linear functions. What's strange I get 1.89ns for various functions, and I use high-performance counters for time measurements.
Sorry i clipped that bit out of my post when editing. I have an accumulating result in my bechmark too but unless that result was used to do something meaningful after the loop, like a printf, MSVC was chopping it all out. It didnt do it with the asm functions but with plain C++ ones it was, and for tanh().

(Also i fixed an error that was adversely affecting my timings for your code cause i'd not converted a few floats to doubles in the benchmark code.)
Chris Jones
www.sonigen.com

Post

I do printf of the sum as shown above, of course.
Image

Post

Aleksey Vaneev wrote:I do printf of the sum as shown above, of course.
oops didnt notice that :oops:
Chris Jones
www.sonigen.com

Post

Wanted to correct myself: of course, step function has a low-pass response. But I'm not getting how waveshaper function's derivative relates to step function. A formula would be useful.
Image

Post Reply

Return to “DSP and Plugin Development”