Optimize plugin code for balanced load or least load?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

mystran wrote: Wed Nov 20, 2019 10:34 am
Z1202 wrote: Wed Nov 20, 2019 9:31 am Hmmm, that's weird. For one I remember that e.g. NaN handling can make comparisons much slower, since they need (at least in certain cases) to be compiled into a sequence of several commands if NaNs are to be properly handled.
It's actually not a huge penalty on x86. With SSE at least, the actual comparisons are generic (similar to integer CMP) and just sets flags (ie. ZF/CF/PF). If either operand is a NaN, then PF is set. While you can check any combination of ZF/CF together, you can only check PF on it's own, so if you care about NaNs you need to check the flags twice. For regular branches, this just means you emit two conditional jumps (first check parity, then whatever comparison you had) rather than one.
I think there were a bit more than just several jumps, but I'm not sure of the details anymore.

Post

syntonica wrote: Wed Nov 20, 2019 10:41 amI've weeded out most all possible sources of NaNs. The one place I need to check isn't horribly critical, in the wave display in the GUI. I just make sure the sample equals itself since NaNs don't. It's cheaper to do that check rather than to block the thread until the wave calculation is complete.
It's not about NaNs appearing or not. It's about generating NaN compliant code, which is what the options control.

Post

Z1202 wrote: Wed Nov 20, 2019 11:13 am]I think there were a bit more than just several jumps, but I'm not sure of the details anymore.
Nah, that's literally all there is.

Additionally, since NaN also sets ZF and CF, if you arrange your condition code checks to always use the ZF=0 and/or CF=0 cases, then those will fail for NaNs as well, allowing you to skip the parity check. So the separate parity check is only really necessary if you want to branch (or do something else) on boolean false.

edit: although I suppose for branching on false you could arrange to swap operands for CF checks, so it might actually only be required when you want to branch on equality comparison failure

Post

syntonica wrote: Mon Nov 18, 2019 8:21 pmYour compiler can handle quite a bit of optimization for you, but not all. I wish there were modern guidelines, but most people just toe the party line with "let the compiler handle it." :dog: Lame, because this is the most fun part of programming for me!
Working through my code with these new tips has been quite intriguing and successful. It's interesting how changing one thing often leads to "oh, and now I can do this - and this - etc."

I'm learning a lot here! Thanks all!

Post

Z1202 wrote: Wed Nov 20, 2019 11:15 am
syntonica wrote: Wed Nov 20, 2019 10:41 amI've weeded out most all possible sources of NaNs. The one place I need to check isn't horribly critical, in the wave display in the GUI. I just make sure the sample equals itself since NaNs don't. It's cheaper to do that check rather than to block the thread until the wave calculation is complete.
It's not about NaNs appearing or not. It's about generating NaN compliant code, which is what the options control.
Then I'm probably not understanding the point here. I understood it to mean that all NaN checks are turned off and any mathematical operation with one or resulting in one returns a NaN rather than throwing am error.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

DJ Warmonger wrote: Wed Nov 20, 2019 7:56 am This is called "multiplication by predicate" and is common optimization technique for all conditional instructions. I already saw it recommended in CUDA code as well as Synthmaker code module.
Thanks DJ Warmonger and mystran. Ah, reinvented something common enough to be named! Now that I'm on a roll dunno which to invent next, the loop or the linked list? :)

Post

syntonica wrote: Wed Nov 20, 2019 7:42 pm Then I'm probably not understanding the point here. I understood it to mean that all NaN checks are turned off and any mathematical operation with one or resulting in one returns a NaN rather than throwing am error.
Let me try to explain this then. First of all, you normally don't get any exceptions thrown, unless you ask for such a thing. The CPU can raise floating point exceptions on invalid/ill-defined operations, but normally these are all masked and we'll just get a NaN as the result instead. This all happens on the CPU hardware level and it's the standard model of floating point computation in C/C++ independent of whatever optimisations you might enable.

Now, what the compiler flags with regards to NaNs do is simply instruct the optimiser to ignore the possibility of NaNs. They will still happen for invalid operations, but the optimiser no longer needs to worry about following the rules of NaN propagation. This can enable some optimisations that would not otherwise be correct, especially when combined with "finite math only" (ie. optimiser is also allowed to ignore the possibility of infinities) and "enable-unsafe-fp-math" (ie. optimiser is essentially allowed to perform algebraic simplification without worrying about finite precision rounding).

For example, with those three flags, we can essentially assume that (a+b-a) simplifies into b. This is "unsafe" because the simplified result is not rounded (ie. large "a" no longer loses precision in "b"), but it's also wrong if a is either an infinity or NaN, because in either of these cases the result should be a NaN instead.

The other thing that ignoring NaNs allows us to do is generate code where branches ignore the possibility of "unordered" results (ie. those were NaNs were involved). This basically means that when you hit a branch with a condition involving NaNs, rather than always taking the "false" branch, we might instead take the "true" branch if this happens to result in more efficient native code.

The bottom line is: these flags don't really "change NaNs" in any way, they just control whether or not the optimiser is required to worry about them.

Post

It should be noted that "unsafe math optimisations" are really "unsafe" in the sense that in some situations they can also lose precision (and sometimes catastrophically so). For example, numerical integration or some trigonometric recurrences can rely on a particular ordering of operations (and sometimes slightly less efficient code) to preserve precision and in some cases changing the order of operations can even make them numerically unstable.

I haven't really personally observed problems with this in practice, but I can certainly come up with examples where a "sufficiently smart compiler" could optimise a stable algorithm into an unstable when such optimisations are enabled.

Post

mystran wrote: Thu Nov 21, 2019 2:08 am Now, what the compiler flags with regards to NaNs do is simply instruct the optimiser to ignore the possibility of NaNs.
So, my basic understanding is basically correct, despite being overly, uh... basic. The compiler just fully ignores the possibility of NaNs rather than saying, Look! A NaN! You shall pass! :lol: Which now makes sense to me, thinking about compile-time vs runtime. Since I've rooted out and eliminated all causes of NaNs in my math, I know I can safely turn off all checks and know that my speakers (and eardrums!) will be safe. This is one of those topics that I'd have to delve further into if I was doing something other than audio.

Also, I did a recheck on -O0 + Relax IEEE Compliance and there is a not insignificant increase in efficiency. However, for some reason, if used with -Ofast, it runs more slowly than with just -Ofast alone. I'm just going to ignore that setting from here on in. :hihi:
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

Syntonica, is all your stuff only sound generators, or do you sometimes process input to output?

If you process input then a misbehaving upstream plugin might feed you lots of nans?

It could depend on how often a host decides to scan streams and weed them out. Maybe in the case of chained plugins, not all hosts would decide to prune out nans in-between each plugin? Dunno.

Post

JCJR wrote: Thu Nov 21, 2019 4:31 am If you process input then a misbehaving upstream plugin might feed you lots of nans?
If you want to sanitise input, you can use std::isnan() for MSVC and __isnan() for clang and GCC. Note that both clang and GCC can optimise std::isnan() into a NOP when using fast-math (which is about as retarded as it gets, but whatever), so you really have to use __isnan() instead.

edit: Just to play safe you might want to put an assert() into your code that checks for this, just so that you'll notice if a newer version of a compiler decides to break your NaN-tests. If everything else fails, it's always possible to compile the sanitiser into a static lib with more conservative optimisations and then link that into the binary.

Post

Thanks mystran. When I was doing hosting had an asm tight loop to in-place strip out nans, denorms and infs from a buffer.

Can't recall if I called it every time in-between every chained plugin.

Maybe it is a low enough probability to be considered excessively paranoid to strip on every imtermediate plugin output buffer. Dunno. Computers keep getting faster but even quick loops can slow you down if you call em often enough.

Post

JCJR wrote: Thu Nov 21, 2019 4:31 am Syntonica, is all your stuff only sound generators, or do you sometimes process input to output?

If you process input then a misbehaving upstream plugin might feed you lots of nans?

It could depend on how often a host decides to scan streams and weed them out. Maybe in the case of chained plugins, not all hosts would decide to prune out nans in-between each plugin? Dunno.
I've just started processing input. I shouldn't need to police the input buffers, should I? :scared:

I'm sure if there's a plugin chain, the output buffer from plugin1 just gets passed to plugin2 as the input buffer and so on and the host just waits for the final output. Judging by my failures, I haven't heard any host sanitizing input or output and only some have limiters.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

syntonica wrote: Thu Nov 21, 2019 7:21 am I'm sure if there's a plugin chain, the output buffer from plugin1 just gets passed to plugin2 as the input buffer and so on and the host just waits for the final output. Judging by my failures, I haven't heard any host sanitizing input or output and only some have limiters.
Some hosts can actually crash on memory corruption if you pass them NaNs (and maybe infinities too).

Post

syntonica wrote: Thu Nov 21, 2019 7:21 am I'm sure if there's a plugin chain, the output buffer from plugin1 just gets passed to plugin2 as the input buffer and so on and the host just waits for the final output. Judging by my failures, I haven't heard any host sanitizing input or output and only some have limiters.
This is a bit primitive view on the process. There could be things like delay compensation, precision conversion, loudness metering, etc in between.

Post Reply

Return to “DSP and Plugin Development”