Optimize plugin code for balanced load or least load?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

mystran wrote: Thu Nov 21, 2019 5:27 am
JCJR wrote: Thu Nov 21, 2019 4:31 am If you process input then a misbehaving upstream plugin might feed you lots of nans?
If you want to sanitise input, you can use std::isnan() for MSVC and __isnan() for clang and GCC. Note that both clang and GCC can optimise std::isnan() into a NOP when using fast-math (which is about as retarded as it gets, but whatever), so you really have to use __isnan() instead.
What about std::isfinite()?

Post

Z1202 wrote: Thu Nov 21, 2019 10:44 am
mystran wrote: Thu Nov 21, 2019 5:27 am
JCJR wrote: Thu Nov 21, 2019 4:31 am If you process input then a misbehaving upstream plugin might feed you lots of nans?
If you want to sanitise input, you can use std::isnan() for MSVC and __isnan() for clang and GCC. Note that both clang and GCC can optimise std::isnan() into a NOP when using fast-math (which is about as retarded as it gets, but whatever), so you really have to use __isnan() instead.
What about std::isfinite()?
No idea, but probably the same as std::isnan(). I don't usually check for infinities explicitly, but rather clip very large input samples at a finite threshold, such that there is still some headroom before any internal computations would start producing infinities (ie. I sanitise input in such a way that output is known to stay finite). Unlike NaNs, infinities don't really require any special handling if you're clipping anyway.

Post

JCJR wrote: Wed Nov 20, 2019 2:52 am delayIdx *= ((delayIdx += 1) < DelayLength);

Assuming jsfx compiles a branchless comparison, TRUE comparisons return 1 and FALSE comparisons return 0. Also jsfx seems to run fastest the fewer times you reference vars.

So the above line first incs delayIdx, then if the new [delayIdx < DelayLength] it multiplies the new delayIdx by 1 (no change), otherwise it multiplies the new delayIdx by 0, resetting the pointer to the buffer bottom.
Xcode gives me an "undetermined" compiler warning for delayIdx with this code. It seems to run OK but I'm not sure it's safe way to do this operation given that warning. Just FYI.

Post

Fender19 wrote: Fri Nov 22, 2019 9:40 pm
JCJR wrote: Wed Nov 20, 2019 2:52 am delayIdx *= ((delayIdx += 1) < DelayLength);

Assuming jsfx compiles a branchless comparison, TRUE comparisons return 1 and FALSE comparisons return 0. Also jsfx seems to run fastest the fewer times you reference vars.

So the above line first incs delayIdx, then if the new [delayIdx < DelayLength] it multiplies the new delayIdx by 1 (no change), otherwise it multiplies the new delayIdx by 0, resetting the pointer to the buffer bottom.
Xcode gives me an "undetermined" compiler warning for delayIdx with this code. It seems to run OK but I'm not sure it's safe way to do this operation given that warning. Just FYI.
Using Booleans as real values is generally frowned upon. Depending on the language, "false"/"true" can be 0/-1, 0/1, negative/positive, 0 or unreal/everything else (positive or negative), etc. I've seen flavors of the same language do it differently. However, it's a great way to use a compare to avoid a branch. So, if you do use it, document what you're doing and use casts liberally, if needed. Plus, I'd code it like this, assuming false=0 and true=1:

Code: Select all

++delayIdx;
delayIdx *= (delayIdx < DelayLength);
It's not necessarily faster, just easier to read. And if your delayIdx bounds are a multiple of 2, I'd go with something like:

Code: Select all

delayIdx = ++delayIdx & 0xFF; // 0 - 255
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

JCJR wrote: Wed Nov 20, 2019 2:52 am delayIdx *= ((delayIdx += 1) < DelayLength);

Assuming jsfx compiles a branchless comparison, TRUE comparisons return 1 and FALSE comparisons return 0. Also jsfx seems to run fastest the fewer times you reference vars.

So the above line first incs delayIdx, then if the new [delayIdx < DelayLength] it multiplies the new delayIdx by 1 (no change), otherwise it multiplies the new delayIdx by 0, resetting the pointer to the buffer bottom.
I misspoke above - Xcode gives me an "unsequenced modification" compiler warning for this code.

Post

Fender19 wrote: Fri Nov 22, 2019 11:11 pm
JCJR wrote: Wed Nov 20, 2019 2:52 am delayIdx *= ((delayIdx += 1) < DelayLength);

Assuming jsfx compiles a branchless comparison, TRUE comparisons return 1 and FALSE comparisons return 0. Also jsfx seems to run fastest the fewer times you reference vars.

So the above line first incs delayIdx, then if the new [delayIdx < DelayLength] it multiplies the new delayIdx by 1 (no change), otherwise it multiplies the new delayIdx by 0, resetting the pointer to the buffer bottom.
I misspoke above - Xcode gives me an "unsequenced modification" compiler warning for this code.
This expression is UB. Nice catch by compiler.

Post

This expression is UB.

In С/С++ - yes. But JCJR posts his jsfx code ;). Obviously in C one is supposed to replace this expression with:

Code: Select all

++delayIdx;
delayIdx *= (delayIdx < DelayLength);
Same for `delayIdx = ++delayIdx & 0xFF;` etc. etc. (now this one was no longer posted as jsfx so :x (<- :clown:)).

Post

Max M. wrote: Sat Nov 23, 2019 3:11 am This expression is UB.

In С/С++ - yes. But JCJR posts his jsfx code ;). Obviously in C one is supposed to replace this expression with:

Code: Select all

++delayIdx;
delayIdx *= (delayIdx < DelayLength);
Same for `delayIdx = ++delayIdx & 0xFF;` etc. etc. (now this one was no longer posted as jsfx so :x (<- :clown:)).
Yes, my bad re the jsfx reference. Code shown above, on two lines, works fine in Xcode.

Post

syntonica wrote: Fri Nov 22, 2019 10:49 pm Using Booleans as real values is generally frowned upon. Depending on the language, "false"/"true" can be 0/-1, 0/1, negative/positive, 0 or unreal/everything else (positive or negative), etc. I've seen flavors of the same language do it differently. However, it's a great way to use a compare to avoid a branch.
Actually, in C++ booleans are either true or false and the compiler is free to represent them in whatever way it finds the most convenient (eg. CPU flags are pretty common). Conversion of a boolean into numeric types does give you either 1 (for true) or 0 (for false), but this is potentially an actual conversion that in theory might even involve a branch (well, not really, except as far as it would be standard compliant).

The code we are discussing is a pessimisation.

Just use "if(condition) index=0;" and you'll get either a well-predicted branch or a conditional move; either of these is faster than doing multiplications.

Post

mystran wrote: Sat Nov 23, 2019 4:57 am The code we are discussing is a pessimisation.

Just use "if(condition) index=0;" and you'll get either a well-predicted branch or a conditional move; either of these is faster than doing multiplications.
I general start with the naive solution as you suggest and then go from there to see if there are speedier alternatives because the code profiler might disagree in the end! :lol: That's why I always set up tests to see what's fastest and avoid following the wisdom of the herd.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

Doing a bit of testing with Godbolt, clang seems to prefer CMOVcc while GCC opts for a branch.

MSVC seems to insist on keeping the index variable in memory (either directly operating on memory, or loading/storing on per-iteration basis), no matter what (ie. apparently even with modern MSVC one should still cache the delay index in a local; in other compilers this used to be a thing some 20 years ago). If the state is in globals, it uses CMOVcc, but with a struct it seems to go for a branch instead.

ICC seems to unroll the loop, then use CMOVcc in the unrolled part and branch in the remainder. When told not to unroll, it uses CMOVcc.

conclusions: don't use MSVC.

Post

The weird thing about MSVC is that it did optimize code to CMOV* somewhere around VC2002 or so. Then suddenly one day (in VC2003) they have just totally forgotten about these kind of instructions (the rumors were that this is just because of some bugs in the optimizer). And yet after almost 20 years it's still like they have no idea such instructions exist at all. Doh!

---
Though here they declare that at least after VS2017 CMOV may appear under certain conditions... Good morning!

Post

Max M. wrote: Sat Nov 23, 2019 3:11 am This expression is UB.

In С/С++ - yes. But JCJR posts his jsfx code ;). Obviously in C one is supposed to replace this expression with:

Code: Select all

++delayIdx;
delayIdx *= (delayIdx < DelayLength);
Same for `delayIdx = ++delayIdx & 0xFF;` etc. etc. (now this one was no longer posted as jsfx so :x (<- :clown:)).
To add, C languages use sequence points to ensure the order of execution. C++ is way more flexible than C with regard to how an assignment operator can be used, return-by-reference is one example. Operations with an implicit sequence point, like pre or post increment, should not be written in a compound statement, regardless if it works or not, simply because it's easier to figure out and to avoid mistakes. When in doubt, use parentheses.

https://stackoverflow.com/questions/357 ... oints-in-c

Post

camsr wrote: Sat Nov 23, 2019 10:31 pm Operations with an implicit sequence point, like pre or post increment, should not be written in a compound statement, regardless if it works or not, simply because it's easier to figure out and to avoid mistakes.
There are no "implicit sequence points" with pre/post increments and this is why mixing such operations with other access to the same variable without an explicit sequence point is undefined behaviour. Parenthesis don't help either, because those just change how the parse tree is built.

Post

I should have written implicit assignment, thanks for pointing it out.

Post Reply

Return to “DSP and Plugin Development”