I think there were a bit more than just several jumps, but I'm not sure of the details anymore.mystran wrote: ↑Wed Nov 20, 2019 10:34 amIt's actually not a huge penalty on x86. With SSE at least, the actual comparisons are generic (similar to integer CMP) and just sets flags (ie. ZF/CF/PF). If either operand is a NaN, then PF is set. While you can check any combination of ZF/CF together, you can only check PF on it's own, so if you care about NaNs you need to check the flags twice. For regular branches, this just means you emit two conditional jumps (first check parity, then whatever comparison you had) rather than one.
Optimize plugin code for balanced load or least load?
-
- KVRAF
- 1607 posts since 12 Apr, 2002
-
- KVRAF
- 1607 posts since 12 Apr, 2002
It's not about NaNs appearing or not. It's about generating NaN compliant code, which is what the options control.syntonica wrote: ↑Wed Nov 20, 2019 10:41 amI've weeded out most all possible sources of NaNs. The one place I need to check isn't horribly critical, in the wave display in the GUI. I just make sure the sample equals itself since NaNs don't. It's cheaper to do that check rather than to block the thread until the wave calculation is complete.
- KVRAF
- 7899 posts since 12 Feb, 2006 from Helsinki, Finland
Nah, that's literally all there is.
Additionally, since NaN also sets ZF and CF, if you arrange your condition code checks to always use the ZF=0 and/or CF=0 cases, then those will fail for NaNs as well, allowing you to skip the parity check. So the separate parity check is only really necessary if you want to branch (or do something else) on boolean false.
edit: although I suppose for branching on false you could arrange to swap operands for CF checks, so it might actually only be required when you want to branch on equality comparison failure
-
- KVRian
- Topic Starter
- 628 posts since 30 Aug, 2012
Working through my code with these new tips has been quite intriguing and successful. It's interesting how changing one thing often leads to "oh, and now I can do this - and this - etc."
I'm learning a lot here! Thanks all!
- KVRAF
- 2245 posts since 25 Sep, 2014 from Specific Northwest
Then I'm probably not understanding the point here. I understood it to mean that all NaN checks are turned off and any mathematical operation with one or resulting in one returns a NaN rather than throwing am error.Z1202 wrote: ↑Wed Nov 20, 2019 11:15 amIt's not about NaNs appearing or not. It's about generating NaN compliant code, which is what the options control.syntonica wrote: ↑Wed Nov 20, 2019 10:41 amI've weeded out most all possible sources of NaNs. The one place I need to check isn't horribly critical, in the wave display in the GUI. I just make sure the sample equals itself since NaNs don't. It's cheaper to do that check rather than to block the thread until the wave calculation is complete.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better?
-
- KVRAF
- 3080 posts since 17 Apr, 2005 from S.E. TN
Thanks DJ Warmonger and mystran. Ah, reinvented something common enough to be named! Now that I'm on a roll dunno which to invent next, the loop or the linked list?DJ Warmonger wrote: ↑Wed Nov 20, 2019 7:56 am This is called "multiplication by predicate" and is common optimization technique for all conditional instructions. I already saw it recommended in CUDA code as well as Synthmaker code module.
- KVRAF
- 7899 posts since 12 Feb, 2006 from Helsinki, Finland
Let me try to explain this then. First of all, you normally don't get any exceptions thrown, unless you ask for such a thing. The CPU can raise floating point exceptions on invalid/ill-defined operations, but normally these are all masked and we'll just get a NaN as the result instead. This all happens on the CPU hardware level and it's the standard model of floating point computation in C/C++ independent of whatever optimisations you might enable.
Now, what the compiler flags with regards to NaNs do is simply instruct the optimiser to ignore the possibility of NaNs. They will still happen for invalid operations, but the optimiser no longer needs to worry about following the rules of NaN propagation. This can enable some optimisations that would not otherwise be correct, especially when combined with "finite math only" (ie. optimiser is also allowed to ignore the possibility of infinities) and "enable-unsafe-fp-math" (ie. optimiser is essentially allowed to perform algebraic simplification without worrying about finite precision rounding).
For example, with those three flags, we can essentially assume that (a+b-a) simplifies into b. This is "unsafe" because the simplified result is not rounded (ie. large "a" no longer loses precision in "b"), but it's also wrong if a is either an infinity or NaN, because in either of these cases the result should be a NaN instead.
The other thing that ignoring NaNs allows us to do is generate code where branches ignore the possibility of "unordered" results (ie. those were NaNs were involved). This basically means that when you hit a branch with a condition involving NaNs, rather than always taking the "false" branch, we might instead take the "true" branch if this happens to result in more efficient native code.
The bottom line is: these flags don't really "change NaNs" in any way, they just control whether or not the optimiser is required to worry about them.
- KVRAF
- 7899 posts since 12 Feb, 2006 from Helsinki, Finland
It should be noted that "unsafe math optimisations" are really "unsafe" in the sense that in some situations they can also lose precision (and sometimes catastrophically so). For example, numerical integration or some trigonometric recurrences can rely on a particular ordering of operations (and sometimes slightly less efficient code) to preserve precision and in some cases changing the order of operations can even make them numerically unstable.
I haven't really personally observed problems with this in practice, but I can certainly come up with examples where a "sufficiently smart compiler" could optimise a stable algorithm into an unstable when such optimisations are enabled.
I haven't really personally observed problems with this in practice, but I can certainly come up with examples where a "sufficiently smart compiler" could optimise a stable algorithm into an unstable when such optimisations are enabled.
- KVRAF
- 2245 posts since 25 Sep, 2014 from Specific Northwest
So, my basic understanding is basically correct, despite being overly, uh... basic. The compiler just fully ignores the possibility of NaNs rather than saying, Look! A NaN! You shall pass! Which now makes sense to me, thinking about compile-time vs runtime. Since I've rooted out and eliminated all causes of NaNs in my math, I know I can safely turn off all checks and know that my speakers (and eardrums!) will be safe. This is one of those topics that I'd have to delve further into if I was doing something other than audio.
Also, I did a recheck on -O0 + Relax IEEE Compliance and there is a not insignificant increase in efficiency. However, for some reason, if used with -Ofast, it runs more slowly than with just -Ofast alone. I'm just going to ignore that setting from here on in.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better?
-
- KVRAF
- 3080 posts since 17 Apr, 2005 from S.E. TN
Syntonica, is all your stuff only sound generators, or do you sometimes process input to output?
If you process input then a misbehaving upstream plugin might feed you lots of nans?
It could depend on how often a host decides to scan streams and weed them out. Maybe in the case of chained plugins, not all hosts would decide to prune out nans in-between each plugin? Dunno.
If you process input then a misbehaving upstream plugin might feed you lots of nans?
It could depend on how often a host decides to scan streams and weed them out. Maybe in the case of chained plugins, not all hosts would decide to prune out nans in-between each plugin? Dunno.
- KVRAF
- 7899 posts since 12 Feb, 2006 from Helsinki, Finland
If you want to sanitise input, you can use std::isnan() for MSVC and __isnan() for clang and GCC. Note that both clang and GCC can optimise std::isnan() into a NOP when using fast-math (which is about as retarded as it gets, but whatever), so you really have to use __isnan() instead.
edit: Just to play safe you might want to put an assert() into your code that checks for this, just so that you'll notice if a newer version of a compiler decides to break your NaN-tests. If everything else fails, it's always possible to compile the sanitiser into a static lib with more conservative optimisations and then link that into the binary.
-
- KVRAF
- 3080 posts since 17 Apr, 2005 from S.E. TN
Thanks mystran. When I was doing hosting had an asm tight loop to in-place strip out nans, denorms and infs from a buffer.
Can't recall if I called it every time in-between every chained plugin.
Maybe it is a low enough probability to be considered excessively paranoid to strip on every imtermediate plugin output buffer. Dunno. Computers keep getting faster but even quick loops can slow you down if you call em often enough.
Can't recall if I called it every time in-between every chained plugin.
Maybe it is a low enough probability to be considered excessively paranoid to strip on every imtermediate plugin output buffer. Dunno. Computers keep getting faster but even quick loops can slow you down if you call em often enough.
- KVRAF
- 2245 posts since 25 Sep, 2014 from Specific Northwest
I've just started processing input. I shouldn't need to police the input buffers, should I?JCJR wrote: ↑Thu Nov 21, 2019 4:31 am Syntonica, is all your stuff only sound generators, or do you sometimes process input to output?
If you process input then a misbehaving upstream plugin might feed you lots of nans?
It could depend on how often a host decides to scan streams and weed them out. Maybe in the case of chained plugins, not all hosts would decide to prune out nans in-between each plugin? Dunno.
I'm sure if there's a plugin chain, the output buffer from plugin1 just gets passed to plugin2 as the input buffer and so on and the host just waits for the final output. Judging by my failures, I haven't heard any host sanitizing input or output and only some have limiters.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better?
- KVRAF
- 7899 posts since 12 Feb, 2006 from Helsinki, Finland
Some hosts can actually crash on memory corruption if you pass them NaNs (and maybe infinities too).syntonica wrote: ↑Thu Nov 21, 2019 7:21 am I'm sure if there's a plugin chain, the output buffer from plugin1 just gets passed to plugin2 as the input buffer and so on and the host just waits for the final output. Judging by my failures, I haven't heard any host sanitizing input or output and only some have limiters.
- KVRist
- 243 posts since 24 Aug, 2014
This is a bit primitive view on the process. There could be things like delay compensation, precision conversion, loudness metering, etc in between.syntonica wrote: ↑Thu Nov 21, 2019 7:21 am I'm sure if there's a plugin chain, the output buffer from plugin1 just gets passed to plugin2 as the input buffer and so on and the host just waits for the final output. Judging by my failures, I haven't heard any host sanitizing input or output and only some have limiters.