Are denormals still an issue?

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

I was under the assumption that modern compilers and processors had gotten rid of the denormals issue, but this article from Earlevel in 2019 says differently: https://www.earlevel.com/main/2019/04/1 ... denormals/

Any thoughts?

Post

Denormals might not be quite as bad on modern CPUs compared to what they were in the past (you used to be able to pretty much freeze the whole system when them), but you should still except a performance hit.

Compilers won't normally really do anything to avoid denormals as such, but as long as any floating point code isn't using the x86 legacy FPU you can set relevant CPU flags yourself (assuming the host doesn't do it already; I think in another thread about denormals on M1 someone suggested that Apple disables them by default already). With modern compilers the x87 FPU is only used for 32-bit build if you don't enable SSE2+ instruction sets (so if you compile 32-bit, sanity check that SSE2 is allowed); for amd64 (ie. 64-bit builds) SSE2 is guaranteed to be available and the legacy FPU generally won't be used except possibly for extended precision which is not relevant for audio.

So... the TL;DR version is, set (on entry, restore on exit) the CPU flags (eg. FTZ/DAZ on x86) and make sure your x86 32-bit compiler is allowed to use SSE2... and you won't see a denormal and won't need to worry about them.

Post

Mystran covered it, but if you want to see an example, scroll down to the two images in my article here:

https://www.earlevel.com/main/2019/04/1 ... denormals/

This was on my older computer (2009 Mac Pro, Nehalem), but as you can see the processor hit is 3x when playing nothing, if denormal protection is turned off. That's not nearly as bad as the old Pentium 4, IIRC, but can still murder a plugin.
My audio DSP blog: earlevel.com

Post

earlevel wrote: Tue Jan 04, 2022 10:47 pm This was on my older computer (2009 Mac Pro, Nehalem), but as you can see the processor hit is 3x when playing nothing, if denormal protection is turned off. That's not nearly as bad as the old Pentium 4, IIRC, but can still murder a plugin.
I literally had to once (on an old computer, can't remember which CPU it was) hold the power button until force power-off to get a computer to recover from some code that was doing a bit too much denormal arithmetics ... so yeah, modern computers are better, but you don't need a whole lot to be just "better" in this case. :D

Post

mystran wrote: Tue Jan 04, 2022 11:08 pmI literally had to once (on an old computer, can't remember which CPU it was) hold the power button until force power-off to get a computer to recover from some code that was doing a bit too much denormal arithmetics ... so yeah, modern computers are better, but you don't need a whole lot to be just "better" in this case. :D
Similarly, when I first worked on native DSP code, I was aware of the issue but hadn't yet addressed it. Everything ran fine on my Power PC Mac, which had a modest penalty for denormals. I knew the Pentium 4 was considerably worse, but it did catch me off guard when I gave it the first run on Windows with a P4—locked it up completely and immediately...
My audio DSP blog: earlevel.com

Post

This approach below looks interesting. He uses a class that destroys itself as soon as it is getting out of scope. However I am not sure how badly creating/destroying classes affects performance. Maybe inline code and destroying it manually would be better.
He also does not seem to restore the original register content. So it can collide with the denormal settings of the DAW and/or other plugins.

https://github.com/rcliftonharvey/rchundenormal
https://www.tone2.com
Our award-winning synthesizers offer true high-end sound quality.

Post

JUCE has a very nice scoped crossplatform implementation here (class ScopedNoDenormals):
https://github.com/juce-framework/JUCE/ ... erations.h

And no, creating and destroying such an object on the stack does not affect performance (no overhead compared to calling set/unset functions manually, it really just calls the constructor and destructor). It'd be a bad idea to create it on the heap using new/delete though, because that'll allocate memory in the realtime thread, which we don't do around here. ;)

If you want to make sure it can be inlined because performance, you can implement it header-only. But that's probably overkill for something that is done once per process block.

Post

I haven't inevstigated the JUCE code in detail, but it seems that none of the solutions does recover the old register status at the end. This is a no-go in the assembler-world since it can have unexpected side effects on other software/plugins/DAW

I suggest this solution:
#include <xmmintrin.h>

//call this at the beginninng of your precoessing block
inline unsigned int disableDenormals()
{
const int maskFTZ = 0x8000; // Mask to switch FLUSH TO ZERO mode
const int maskDAZ = 0x0040; // Mask to switch DENORMALS ARE ZERO mode
unsigned int oldRegisterStatus = _mm_getcsr();
_mm_setcsr(_mm_getcsr() | maskFTZ);
_mm_setcsr(_mm_getcsr() | maskDAZ);
return oldRegisterStatus;
}

//recover the old register status at the end of your processing block
inline void recoverOldDenormalsRegisterStatus(unsigned int oldRegisterStatus)
{
_mm_setcsr(oldRegisterStatus);
}

void myPlugin::processReplacing (float **inputs, float **outputs, VstInt32 sampleFrames)
{
unsigned int oldRegisterStatus = disableDenormals();
...
//process your stuff here
...
recoverOldDenormalsRegisterStatus(oldRegisterStatus);
}
https://www.tone2.com
Our award-winning synthesizers offer true high-end sound quality.

Post

It does:

Code: Select all

ScopedNoDenormals::ScopedNoDenormals() noexcept
{
  #if JUCE_USE_SSE_INTRINSICS || (JUCE_USE_ARM_NEON || defined (__arm64__) || defined (__aarch64__))
   #if JUCE_USE_SSE_INTRINSICS
    intptr_t mask = 0x8040;
   #else /*JUCE_USE_ARM_NEON*/
    intptr_t mask = (1 << 24 /* FZ */);
   #endif

    fpsr = FloatVectorOperations::getFpStatusRegister();
    FloatVectorOperations::setFpStatusRegister (fpsr | mask);
  #endif
}

ScopedNoDenormals::~ScopedNoDenormals() noexcept
{
  #if JUCE_USE_SSE_INTRINSICS || (JUCE_USE_ARM_NEON || defined (__arm64__) || defined (__aarch64__))
    FloatVectorOperations::setFpStatusRegister (fpsr);
  #endif
}

Post

This is pretty much a canonical example of a situation where you really want to use a RAII wrapper (eg. similar to the JUCE one).

Post Reply

Return to “DSP and Plugin Development”