Random cheap sigmoid

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Sort of randomly, I realised that x/sqrt(x^2+1) makes for a pretty nice cheap sigmoid, when you have a CPU that can do reciprocal square roots. Here's a comparison again tanh and the well known x/(abs(x)+1)):

https://www.desmos.com/calculator/p57j53uvov

It seems like a really nice one for those situations where you don't really care for a particular shape, but want something that is reasonably close to linear at low values, asymptotically approaches unity and preferably doesn't take too many cycles to compute. For whatever reason, it doesn't seem like this is particularly popular though (as in.. can't even find it on the web)... any idea why?

Post

mystran wrote: Wed Mar 06, 2019 1:40 pm Sort of randomly, I realised that x/sqrt(x^2+1) makes for a pretty nice cheap sigmoid, when you have a CPU that can do reciprocal square roots. Here's a comparison again tanh and the well known x/(abs(x)+1)):

https://www.desmos.com/calculator/p57j53uvov

It seems like a really nice one for those situations where you don't really care for a particular shape, but want something that is reasonably close to linear at low values, asymptotically approaches unity and preferably doesn't take too many cycles to compute. For whatever reason, it doesn't seem like this is particularly popular though (as in.. can't even find it on the web)... any idea why?
I don't like fast reciprocal square roots, because they are not consistent between different CPU models. :)

Post

One can also make this into a sigmoid that's a bit sharper than tanh():

softer(x) = x / sqrt(1 + x^2)
harder(x) = softer(x + 0.5*x^3)

The derivative of harder(x) is still bounded to 1 and the second and third derivatives are zero at x=0 (edit: actually the 4th derivative is zero as well, but the point is, it's pretty much linear at low values).

Post

mystran wrote: Wed Mar 06, 2019 2:59 pm One can also make this into a sigmoid that's a bit sharper than tanh():
softer(x) = x / sqrt(1 + x^2)
harder(x) = softer(x + 0.5*x^3)
really nice. next step: introduce a hardness parameter p:

harder(x) = softer(x + p*x^3)

https://www.desmos.com/calculator/9t551983eq

at p = 0.19, it's visually really close to tanh :D
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

Kaby Lake i5 x64 benchmark:

"FastTanh" over 100.000 run(s)
Average = 6 microsecs, minimum = 4 microsecs, maximum = 40 microsecs, total = 641 microsecs

"std::tanh" over 100.000 run(s)
Average = 15 microsecs, minimum = 8 microsecs, maximum = 51 microsecs, total = 1494 microsecs

Not bad(!?) given it can trivially be SIMD optimised.

Code: Select all

float FastRandomSigmond(float x)
{
	return x * 1.0f/sqrt(x*x+1);
}

float FastTanh(float x)
{
	return FastRandomSigmond(x+0.19*x*x*x);
}

Post

Music Engineer wrote: Wed Mar 06, 2019 5:25 pm
mystran wrote: Wed Mar 06, 2019 2:59 pm One can also make this into a sigmoid that's a bit sharper than tanh():
softer(x) = x / sqrt(1 + x^2)
harder(x) = softer(x + 0.5*x^3)
really nice. next step: introduce a hardness parameter p:

harder(x) = softer(x + p*x^3)

https://www.desmos.com/calculator/9t551983eq

at p = 0.19, it's visually really close to tanh :D
Note that p=0.5 is the limit where the first derivative is still bounded to 1, which could be important in feedback systems.

For even harder clippers, one can add more terms without giving up the bounded first derivative and the coefficients that maximise linearity around zero are actually really simple to find. You just take the Taylor expansion of softer(x) around zero and ignore the alternating signs:

hard5(x) = softer(x + 1/2.*x^3 + 3/8.*x^5)
hard7(x) = softer(x + 1/2.*x^3 + 3/8.*x^5 + 5/16.*x^7)
...

It seems reasonable to assume that this converges to a hard-clipper as the order is taken to the limit at infinity.

Post

CurryPaste wrote: Wed Mar 06, 2019 5:56 pm Not bad(!?) given it can trivially be SIMD optimised.
The idea is you replace the division and sqrt with the CPU-native rsqrt and a single Newton iteration (which gets you to pretty much full single-precision).

Post

As for tanh.. let's get a bit more scientific:

tanh(x) = sinh(x) / cosh(x) = sinh(x) / sqrt(1 + sinh(x)^2)!

so given f(x) = x / sqrt(1 + x^2) we have f(sinh(x)) = tanh(x)!

Taking Taylor expansion of sinh(x) around zero, we get: x+x^3/6+x^5/120+...

This is enough to make tanh accurate to a bit over 3 decimals: https://www.desmos.com/calculator/eqdkytakqi

Post

Note that p=0.5 is the limit where the first derivative is still bounded to 1, which could be important in feedback systems.
yes, good point. i'll keep this in mind. thanks also for the series expansions. cheap sigmoids are always welcome!
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

another (not so cheap) nice one is:

tanh(sinh(x))

or with variable "hardness" parameter "a"

tanh((1.0 / a) * sinh( a * x ))

with:

0 < a < ~2

or so...

or some taylor/pade approx thereof, but the approximations of it don't behave well for large input arguments...

a = sqrt(2.0) if you want to keep d/dx limited to exactly 1.0

Larger a is fine/interesting too if you don't care about such things.

Post

For adjustable versions with more terms (eg. for the hard7 above), it helps to use higher powers of the parameter for the higher degree terms: https://www.desmos.com/calculator/fbnphiusvb

Post

One nice thing about the particular form x/sqrt(x^2 + 1) is that you can antialias accordiing to http://dafx16.vutbr.cz/dafxpapers/20-DA ... _41-PN.pdf in closed form. :wink:

Post

x/sqrt(x^2+1) is an equation listed on the Sigmoid Function wiki article..
It's not unpopular I think, it's just useful to a handful of applications.

Post

Benchmark, Kaby Lake i5 x64:

The below SSE2 tanh approximation (using a single Newton iteration) is almost exactly double as fast as std::tanh - in other words computes 8 instead of 1 in the same time.

Code: Select all

inline __m128 RSqrt(__m128 x)
{
	__m128 r = _mm_rsqrt_ps(x);
	r = _mm_mul_ps(
		_mm_mul_ps(_mm_set1_ps(0.5), r),
		_mm_sub_ps(_mm_set1_ps(3), _mm_mul_ps(_mm_mul_ps(x, r), r)));
	return r;
}

inline __m128 FastSigmond(__m128 x)
{
	return _mm_mul_ps(x, RSqrt(_mm_add_ps(_mm_mul_ps(x,x), _mm_set1_ps(1.0))));
}

inline __m128 FastTanh(__m128 x)
{
	return FastSigmond(_mm_add_ps(x, _mm_mul_ps(_mm_mul_ps(_mm_set1_ps(0.19),x), _mm_mul_ps(x,x))));
}

Post

Here's something similar. Inside limited range it acts fairly close to Tanh.

inline double Something_Like_Tanh(double x)
{
x *= 1.2;

const double d = fabs(x);
const double d2 = d*d;
return copysign(d - 0.375*d2 + 0.0625*d2*d - 0.00390625*d2*d2, x);
}

Post Reply

Return to “DSP and Plugin Development”