KVR Audio

2DaT · Post by **2DaT** » Wed May 05, 2021 9:27 pm

mystran wrote: ↑Wed May 05, 2021 10:06 am
While min/max is a no-brainer here, it got me wondering: is AVX the minimum to actually do generalized branches?

Basically with AVX one can do _mm_cmp_ps (or the wider equivalents) to get a mask for conditionally selecting values, then follow up with PTEST (_mm_test_all_zeroes, etc; SSE4.1) to get CF or ZF to only compute the branches where at least one value is actually needed, so Turing equivalent computation is possible without ever unpacking the vectors, but is there some trick to do this with just SSE2 with sensible amount of shuffling?

_mm_movemask_ps might do the trick.

mystran · Post by **mystran** » Wed May 05, 2021 10:26 pm

2DaT wrote: ↑Wed May 05, 2021 9:27 pm
mystran wrote: ↑Wed May 05, 2021 10:06 am
While min/max is a no-brainer here, it got me wondering: is AVX the minimum to actually do generalized branches?

Basically with AVX one can do _mm_cmp_ps (or the wider equivalents) to get a mask for conditionally selecting values, then follow up with PTEST (_mm_test_all_zeroes, etc; SSE4.1) to get CF or ZF to only compute the branches where at least one value is actually needed, so Turing equivalent computation is possible without ever unpacking the vectors, but is there some trick to do this with just SSE2 with sensible amount of shuffling?
_mm_movemask_ps might do the trick.

Oh, I just realized the SSE versions of comparisons are _mm_cmpXX_ps which map to CMPPS where as _mm_cmp_ps maps to VCMPPS, which is basically the same thing, except the intrinsic-syntax is different. It's just a joy that Intel is consistent with these?!?

You're right though, _mm_movemask_ps will do the job at the cost of extra integer TEST to set the flags.

juha_p · Post by **juha_p** » Thu May 06, 2021 9:34 pm

Thanks for suggestions. Looks like I got branch working now ... hmmm... maybe (xmax) position needs some adjusting (CE).

rafa1981 · Post by **rafa1981** » Wed Jun 16, 2021 7:00 am

martinvicanek wrote: ↑Fri Mar 22, 2019 12:51 am You dont even need to worry about division if you rewrite
(sqrt(1 + x^2) - sqrt(1+ x1^2))/(x - x1) = (x + x1)/(sqrt(1 + x^2) + sqrt(1 + x1^2))

Excuse my extremely rusty math. What happened here?

martinvicanek · Post by **martinvicanek** » Wed Jun 16, 2021 10:44 am

rafa1981 wrote: ↑Wed Jun 16, 2021 7:00 am
martinvicanek wrote: ↑Fri Mar 22, 2019 12:51 am You dont even need to worry about division if you rewrite
(sqrt(1 + x^2) - sqrt(1+ x1^2))/(x - x1) = (x + x1)/(sqrt(1 + x^2) + sqrt(1 + x1^2))

Excuse my extremely rusty math. What happened here?

It should actually read "You dont even need to worry about division by zero". You still have a division, but the denominator is always >= 2.
To prove the above equality, multiply on the left hand side both the numerator and dennominator by (sqrt(1 + x^2) + sqrt(1+ x1^2)), then use the identity x^2 - x1^2 = (x + x1)(x - x1).

rafa1981 · Post by **rafa1981** » Wed Jun 16, 2021 1:44 pm

martinvicanek wrote: ↑Wed Jun 16, 2021 10:44 am To prove the above equality, multiply on the left hand side both the numerator and dennominator by (sqrt(1 + x^2) + sqrt(1+ x1^2)), then use the identity x^2 - x1^2 = (x + x1)(x - x1).

I see. Clever use of "the basics".

mystran · Post by **mystran** » Wed Jun 16, 2021 1:56 pm

martinvicanek wrote: ↑Wed Jun 16, 2021 10:44 am
rafa1981 wrote: ↑Wed Jun 16, 2021 7:00 am
martinvicanek wrote: ↑Fri Mar 22, 2019 12:51 am You dont even need to worry about division if you rewrite
(sqrt(1 + x^2) - sqrt(1+ x1^2))/(x - x1) = (x + x1)/(sqrt(1 + x^2) + sqrt(1 + x1^2))

Excuse my extremely rusty math. What happened here?
It should actually read "You dont even need to worry about division by zero". You still have a division, but the denominator is always >= 2.
To prove the above equality, multiply on the left hand side both the numerator and dennominator by (sqrt(1 + x^2) + sqrt(1+ x1^2)), then use the identity x^2 - x1^2 = (x + x1)(x - x1).

Two square roots and a division make it quite expensive to actually compute though.

rafa1981 · Post by **rafa1981** » Wed Jun 16, 2021 3:57 pm

Actually you can save the previous "sqrt" result along with the previous sample input (x1), so one "sqrt" can go away at the expense of one extra state variable.

This is doing antialiasing at very low latency (0.5 samples? 1 sample?) and almost no memory usage, so the definition of cheap is relative. I'm very bad at math and DSP and I might be missing some better ways.

Random cheap sigmoid