Random cheap sigmoid
- KVRAF
- Topic Starter
- 7892 posts since 12 Feb, 2006 from Helsinki, Finland
Sort of randomly, I realised that x/sqrt(x^2+1) makes for a pretty nice cheap sigmoid, when you have a CPU that can do reciprocal square roots. Here's a comparison again tanh and the well known x/(abs(x)+1)):
https://www.desmos.com/calculator/p57j53uvov
It seems like a really nice one for those situations where you don't really care for a particular shape, but want something that is reasonably close to linear at low values, asymptotically approaches unity and preferably doesn't take too many cycles to compute. For whatever reason, it doesn't seem like this is particularly popular though (as in.. can't even find it on the web)... any idea why?
https://www.desmos.com/calculator/p57j53uvov
It seems like a really nice one for those situations where you don't really care for a particular shape, but want something that is reasonably close to linear at low values, asymptotically approaches unity and preferably doesn't take too many cycles to compute. For whatever reason, it doesn't seem like this is particularly popular though (as in.. can't even find it on the web)... any idea why?
-
- KVRian
- 631 posts since 21 Jun, 2013
I don't like fast reciprocal square roots, because they are not consistent between different CPU models.mystran wrote: ↑Wed Mar 06, 2019 1:40 pm Sort of randomly, I realised that x/sqrt(x^2+1) makes for a pretty nice cheap sigmoid, when you have a CPU that can do reciprocal square roots. Here's a comparison again tanh and the well known x/(abs(x)+1)):
https://www.desmos.com/calculator/p57j53uvov
It seems like a really nice one for those situations where you don't really care for a particular shape, but want something that is reasonably close to linear at low values, asymptotically approaches unity and preferably doesn't take too many cycles to compute. For whatever reason, it doesn't seem like this is particularly popular though (as in.. can't even find it on the web)... any idea why?
- KVRAF
- Topic Starter
- 7892 posts since 12 Feb, 2006 from Helsinki, Finland
One can also make this into a sigmoid that's a bit sharper than tanh():
softer(x) = x / sqrt(1 + x^2)
harder(x) = softer(x + 0.5*x^3)
The derivative of harder(x) is still bounded to 1 and the second and third derivatives are zero at x=0 (edit: actually the 4th derivative is zero as well, but the point is, it's pretty much linear at low values).
softer(x) = x / sqrt(1 + x^2)
harder(x) = softer(x + 0.5*x^3)
The derivative of harder(x) is still bounded to 1 and the second and third derivatives are zero at x=0 (edit: actually the 4th derivative is zero as well, but the point is, it's pretty much linear at low values).
-
Music Engineer Music Engineer https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=15959
- KVRAF
- 4292 posts since 8 Mar, 2004 from Berlin, Germany
really nice. next step: introduce a hardness parameter p:
harder(x) = softer(x + p*x^3)
https://www.desmos.com/calculator/9t551983eq
at p = 0.19, it's visually really close to tanh
-
- KVRist
- 45 posts since 7 Jul, 2012
Kaby Lake i5 x64 benchmark:
"FastTanh" over 100.000 run(s)
Average = 6 microsecs, minimum = 4 microsecs, maximum = 40 microsecs, total = 641 microsecs
"std::tanh" over 100.000 run(s)
Average = 15 microsecs, minimum = 8 microsecs, maximum = 51 microsecs, total = 1494 microsecs
Not bad(!?) given it can trivially be SIMD optimised.
"FastTanh" over 100.000 run(s)
Average = 6 microsecs, minimum = 4 microsecs, maximum = 40 microsecs, total = 641 microsecs
"std::tanh" over 100.000 run(s)
Average = 15 microsecs, minimum = 8 microsecs, maximum = 51 microsecs, total = 1494 microsecs
Not bad(!?) given it can trivially be SIMD optimised.
Code: Select all
float FastRandomSigmond(float x)
{
return x * 1.0f/sqrt(x*x+1);
}
float FastTanh(float x)
{
return FastRandomSigmond(x+0.19*x*x*x);
}
- KVRAF
- Topic Starter
- 7892 posts since 12 Feb, 2006 from Helsinki, Finland
Note that p=0.5 is the limit where the first derivative is still bounded to 1, which could be important in feedback systems.Music Engineer wrote: ↑Wed Mar 06, 2019 5:25 pmreally nice. next step: introduce a hardness parameter p:
harder(x) = softer(x + p*x^3)
https://www.desmos.com/calculator/9t551983eq
at p = 0.19, it's visually really close to tanh
For even harder clippers, one can add more terms without giving up the bounded first derivative and the coefficients that maximise linearity around zero are actually really simple to find. You just take the Taylor expansion of softer(x) around zero and ignore the alternating signs:
hard5(x) = softer(x + 1/2.*x^3 + 3/8.*x^5)
hard7(x) = softer(x + 1/2.*x^3 + 3/8.*x^5 + 5/16.*x^7)
...
It seems reasonable to assume that this converges to a hard-clipper as the order is taken to the limit at infinity.
- KVRAF
- Topic Starter
- 7892 posts since 12 Feb, 2006 from Helsinki, Finland
The idea is you replace the division and sqrt with the CPU-native rsqrt and a single Newton iteration (which gets you to pretty much full single-precision).
- KVRAF
- Topic Starter
- 7892 posts since 12 Feb, 2006 from Helsinki, Finland
As for tanh.. let's get a bit more scientific:
tanh(x) = sinh(x) / cosh(x) = sinh(x) / sqrt(1 + sinh(x)^2)!
so given f(x) = x / sqrt(1 + x^2) we have f(sinh(x)) = tanh(x)!
Taking Taylor expansion of sinh(x) around zero, we get: x+x^3/6+x^5/120+...
This is enough to make tanh accurate to a bit over 3 decimals: https://www.desmos.com/calculator/eqdkytakqi
tanh(x) = sinh(x) / cosh(x) = sinh(x) / sqrt(1 + sinh(x)^2)!
so given f(x) = x / sqrt(1 + x^2) we have f(sinh(x)) = tanh(x)!
Taking Taylor expansion of sinh(x) around zero, we get: x+x^3/6+x^5/120+...
This is enough to make tanh accurate to a bit over 3 decimals: https://www.desmos.com/calculator/eqdkytakqi
-
Music Engineer Music Engineer https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=15959
- KVRAF
- 4292 posts since 8 Mar, 2004 from Berlin, Germany
yes, good point. i'll keep this in mind. thanks also for the series expansions. cheap sigmoids are always welcome!Note that p=0.5 is the limit where the first derivative is still bounded to 1, which could be important in feedback systems.
- KVRAF
- 2621 posts since 12 Sep, 2008
another (not so cheap) nice one is:
tanh(sinh(x))
or with variable "hardness" parameter "a"
tanh((1.0 / a) * sinh( a * x ))
with:
0 < a < ~2
or so...
or some taylor/pade approx thereof, but the approximations of it don't behave well for large input arguments...
a = sqrt(2.0) if you want to keep d/dx limited to exactly 1.0
Larger a is fine/interesting too if you don't care about such things.
tanh(sinh(x))
or with variable "hardness" parameter "a"
tanh((1.0 / a) * sinh( a * x ))
with:
0 < a < ~2
or so...
or some taylor/pade approx thereof, but the approximations of it don't behave well for large input arguments...
a = sqrt(2.0) if you want to keep d/dx limited to exactly 1.0
Larger a is fine/interesting too if you don't care about such things.
- KVRAF
- Topic Starter
- 7892 posts since 12 Feb, 2006 from Helsinki, Finland
For adjustable versions with more terms (eg. for the hard7 above), it helps to use higher powers of the parameter for the higher degree terms: https://www.desmos.com/calculator/fbnphiusvb
-
- KVRist
- 51 posts since 16 Mar, 2014
One nice thing about the particular form x/sqrt(x^2 + 1) is that you can antialias accordiing to http://dafx16.vutbr.cz/dafxpapers/20-DA ... _41-PN.pdf in closed form.
-
- KVRist
- 45 posts since 7 Jul, 2012
Benchmark, Kaby Lake i5 x64:
The below SSE2 tanh approximation (using a single Newton iteration) is almost exactly double as fast as std::tanh - in other words computes 8 instead of 1 in the same time.
The below SSE2 tanh approximation (using a single Newton iteration) is almost exactly double as fast as std::tanh - in other words computes 8 instead of 1 in the same time.
Code: Select all
inline __m128 RSqrt(__m128 x)
{
__m128 r = _mm_rsqrt_ps(x);
r = _mm_mul_ps(
_mm_mul_ps(_mm_set1_ps(0.5), r),
_mm_sub_ps(_mm_set1_ps(3), _mm_mul_ps(_mm_mul_ps(x, r), r)));
return r;
}
inline __m128 FastSigmond(__m128 x)
{
return _mm_mul_ps(x, RSqrt(_mm_add_ps(_mm_mul_ps(x,x), _mm_set1_ps(1.0))));
}
inline __m128 FastTanh(__m128 x)
{
return FastSigmond(_mm_add_ps(x, _mm_mul_ps(_mm_mul_ps(_mm_set1_ps(0.19),x), _mm_mul_ps(x,x))));
}
-
- KVRAF
- 1609 posts since 13 Oct, 2003 from Oulu, Finland
Here's something similar. Inside limited range it acts fairly close to Tanh.
inline double Something_Like_Tanh(double x)
{
x *= 1.2;
const double d = fabs(x);
const double d2 = d*d;
return copysign(d - 0.375*d2 + 0.0625*d2*d - 0.00390625*d2*d2, x);
}
inline double Something_Like_Tanh(double x)
{
x *= 1.2;
const double d = fabs(x);
const double d2 = d*d;
return copysign(d - 0.375*d2 + 0.0625*d2*d - 0.00390625*d2*d2, x);
}