KVR Audio

mystran · Post by **mystran** » Wed Mar 06, 2019 1:40 pm

Sort of randomly, I realised that x/sqrt(x^2+1) makes for a pretty nice cheap sigmoid, when you have a CPU that can do reciprocal square roots. Here's a comparison again tanh and the well known x/(abs(x)+1)):

https://www.desmos.com/calculator/p57j53uvov

It seems like a really nice one for those situations where you don't really care for a particular shape, but want something that is reasonably close to linear at low values, asymptotically approaches unity and preferably doesn't take too many cycles to compute. For whatever reason, it doesn't seem like this is particularly popular though (as in.. can't even find it on the web)... any idea why?

2DaT · Post by **2DaT** » Wed Mar 06, 2019 2:56 pm

mystran wrote: ↑Wed Mar 06, 2019 1:40 pm Sort of randomly, I realised that x/sqrt(x^2+1) makes for a pretty nice cheap sigmoid, when you have a CPU that can do reciprocal square roots. Here's a comparison again tanh and the well known x/(abs(x)+1)):

https://www.desmos.com/calculator/p57j53uvov

It seems like a really nice one for those situations where you don't really care for a particular shape, but want something that is reasonably close to linear at low values, asymptotically approaches unity and preferably doesn't take too many cycles to compute. For whatever reason, it doesn't seem like this is particularly popular though (as in.. can't even find it on the web)... any idea why?

I don't like fast reciprocal square roots, because they are not consistent between different CPU models.

mystran · Post by **mystran** » Wed Mar 06, 2019 2:59 pm

One can also make this into a sigmoid that's a bit sharper than tanh():

softer(x) = x / sqrt(1 + x^2)
harder(x) = softer(x + 0.5*x^3)

The derivative of harder(x) is still bounded to 1 and the second and third derivatives are zero at x=0 (edit: actually the 4th derivative is zero as well, but the point is, it's pretty much linear at low values).

Music Engineer · Post by **Music Engineer** » Wed Mar 06, 2019 5:25 pm

mystran wrote: ↑Wed Mar 06, 2019 2:59 pm One can also make this into a sigmoid that's a bit sharper than tanh():
softer(x) = x / sqrt(1 + x^2)
harder(x) = softer(x + 0.5*x^3)

really nice. next step: introduce a hardness parameter p:

harder(x) = softer(x + p*x^3)

https://www.desmos.com/calculator/9t551983eq

at p = 0.19, it's visually really close to tanh

CurryPaste · Post by **CurryPaste** » Wed Mar 06, 2019 5:56 pm

Kaby Lake i5 x64 benchmark:

"FastTanh" over 100.000 run(s)
Average = 6 microsecs, minimum = 4 microsecs, maximum = 40 microsecs, total = 641 microsecs

"std::tanh" over 100.000 run(s)
Average = 15 microsecs, minimum = 8 microsecs, maximum = 51 microsecs, total = 1494 microsecs

Not bad(!?) given it can trivially be SIMD optimised.

Code: Select all

float FastRandomSigmond(float x)
{
	return x * 1.0f/sqrt(x*x+1);
}

float FastTanh(float x)
{
	return FastRandomSigmond(x+0.19*x*x*x);
}

mystran · Post by **mystran** » Wed Mar 06, 2019 6:03 pm

Music Engineer wrote: ↑Wed Mar 06, 2019 5:25 pm
mystran wrote: ↑Wed Mar 06, 2019 2:59 pm One can also make this into a sigmoid that's a bit sharper than tanh():
softer(x) = x / sqrt(1 + x^2)
harder(x) = softer(x + 0.5*x^3)
really nice. next step: introduce a hardness parameter p:

harder(x) = softer(x + p*x^3)

https://www.desmos.com/calculator/9t551983eq

at p = 0.19, it's visually really close to tanh

Note that p=0.5 is the limit where the first derivative is still bounded to 1, which could be important in feedback systems.

For even harder clippers, one can add more terms without giving up the bounded first derivative and the coefficients that maximise linearity around zero are actually really simple to find. You just take the Taylor expansion of softer(x) around zero and ignore the alternating signs:

hard5(x) = softer(x + 1/2.*x^3 + 3/8.*x^5)
hard7(x) = softer(x + 1/2.*x^3 + 3/8.*x^5 + 5/16.*x^7)
...

It seems reasonable to assume that this converges to a hard-clipper as the order is taken to the limit at infinity.

mystran · Post by **mystran** » Wed Mar 06, 2019 6:10 pm

CurryPaste wrote: ↑Wed Mar 06, 2019 5:56 pm Not bad(!?) given it can trivially be SIMD optimised.

The idea is you replace the division and sqrt with the CPU-native rsqrt and a single Newton iteration (which gets you to pretty much full single-precision).

mystran · Post by **mystran** » Wed Mar 06, 2019 6:37 pm

As for tanh.. let's get a bit more scientific:

tanh(x) = sinh(x) / cosh(x) = sinh(x) / sqrt(1 + sinh(x)^2)!

so given f(x) = x / sqrt(1 + x^2) we have f(sinh(x)) = tanh(x)!

Taking Taylor expansion of sinh(x) around zero, we get: x+x^3/6+x^5/120+...

This is enough to make tanh accurate to a bit over 3 decimals: https://www.desmos.com/calculator/eqdkytakqi

Music Engineer · Post by **Music Engineer** » Wed Mar 06, 2019 7:34 pm

Note that p=0.5 is the limit where the first derivative is still bounded to 1, which could be important in feedback systems.

yes, good point. i'll keep this in mind. thanks also for the series expansions. cheap sigmoids are always welcome!

Andrew Souter · Post by **Andrew Souter** » Wed Mar 06, 2019 9:07 pm

another (not so cheap) nice one is:

tanh(sinh(x))

or with variable "hardness" parameter "a"

tanh((1.0 / a) * sinh( a * x ))

with:

0 < a < ~2

or so...

or some taylor/pade approx thereof, but the approximations of it don't behave well for large input arguments...

a = sqrt(2.0) if you want to keep d/dx limited to exactly 1.0

Larger a is fine/interesting too if you don't care about such things.

mystran · Post by **mystran** » Wed Mar 06, 2019 9:25 pm

For adjustable versions with more terms (eg. for the hard7 above), it helps to use higher powers of the parameter for the higher degree terms: https://www.desmos.com/calculator/fbnphiusvb

martinvicanek · Post by **martinvicanek** » Wed Mar 06, 2019 10:53 pm

One nice thing about the particular form x/sqrt(x^2 + 1) is that you can antialias accordiing to http://dafx16.vutbr.cz/dafxpapers/20-DA ... _41-PN.pdf in closed form.

camsr · Post by **camsr** » Thu Mar 07, 2019 6:26 am

x/sqrt(x^2+1) is an equation listed on the Sigmoid Function wiki article..
It's not unpopular I think, it's just useful to a handful of applications.

CurryPaste · Post by **CurryPaste** » Thu Mar 07, 2019 8:13 am

Benchmark, Kaby Lake i5 x64:

The below SSE2 tanh approximation (using a single Newton iteration) is almost exactly double as fast as std::tanh - in other words computes 8 instead of 1 in the same time.

Code: Select all

inline __m128 RSqrt(__m128 x)
{
	__m128 r = _mm_rsqrt_ps(x);
	r = _mm_mul_ps(
		_mm_mul_ps(_mm_set1_ps(0.5), r),
		_mm_sub_ps(_mm_set1_ps(3), _mm_mul_ps(_mm_mul_ps(x, r), r)));
	return r;
}

inline __m128 FastSigmond(__m128 x)
{
	return _mm_mul_ps(x, RSqrt(_mm_add_ps(_mm_mul_ps(x,x), _mm_set1_ps(1.0))));
}

inline __m128 FastTanh(__m128 x)
{
	return FastSigmond(_mm_add_ps(x, _mm_mul_ps(_mm_mul_ps(_mm_set1_ps(0.19),x), _mm_mul_ps(x,x))));
}

Kraku · Post by **Kraku** » Thu Mar 07, 2019 10:45 am

Here's something similar. Inside limited range it acts fairly close to Tanh.

inline double Something_Like_Tanh(double x)
{
x *= 1.2;

const double d = fabs(x);
const double d2 = d*d;
return copysign(d - 0.375*d2 + 0.0625*d2*d - 0.00390625*d2*d2, x);
}

Random cheap sigmoid