In a dither over "dither"

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Fender19 wrote: Fri Sep 18, 2020 5:39 pm 1) the lowest level a 16bit signed int can capture is -90.3dBFS. So why is the dither signal - at -90.3dBFS - AUDIBLE in the 16bit wave files exported from Cubase and Reaper?
Why would it not be? At 16-bits as long as it's not stuck at a single quantization level it's probably going to be audible (at least in good conditions and when not masked by louder signals).

That said, I would like to point out that thinking in terms of 1 lsb here is not terribly helpful, because we don't hear time-domain waveforms, but rather frequencies. When flat TPDF is applied the noise floor in spectral domain is around 112dB below that of a full-scale sinewave and anything above that can be represented just fine. With noise shaping you can gain even more dynamic range in selected parts of the spectrum, at the cost of losing it elsewhere.
2) the peak level of a 16bit wave file in the positive excursion is 32767 so why are we scaling the dither signal by 1<<15 (32768) and not 1<<15 - 1 (32767) ?
While you're probably technically correct, this makes very little difference in practice.

Post

mystran wrote: Fri Sep 18, 2020 6:15 pm
Fender19 wrote: Fri Sep 18, 2020 5:39 pm 1) the lowest level a 16bit signed int can capture is -90.3dBFS. So why is the dither signal - at -90.3dBFS - AUDIBLE in the 16bit wave files exported from Cubase and Reaper?
Why would it not be?
Because most of the signal is below -90.3dBFS. Only the peaks touch it. So I would think all we should capture is a few random bits - which IS what happens in Wavelab and Pro Tools.

Post

Fender19 wrote: Fri Sep 18, 2020 6:22 pm Because most of the signal is below -90.3dBFS. Only the peaks touch it. So I would think all we should capture is a few random bits - which IS what happens in Wavelab and Pro Tools.
This sounds like a problem in these applications then.

For correct quantization, we need every quantization step to be of equal size. This is not the case if you round towards zero (which is why it really isn't a valid way to quantize), but whether you round to nearest or take floor/ceil doesn't strictly speaking matter (beyond a tiny DC offset if the signal is later interpreted as having been quantized using another rule). What matters is that all steps should be equal.

When the quantization steps are equal, 2 lsb TPDF (or really any correct dither, with or without noise shaping) will alternate between multiple levels randomly such that (statistically) the noise spectrum is the same whatever other signals (including the special case of none) are mixed with it prior to the rounding.

Whether the 2 lsb TPDF alternatives between 2 or 3 levels depends on the combination of the actual signal and the rounding mode used, but it will always alternative in such a way that the result is the same flat noise floor in the spectral domain. This is what makes a dither work, because it is the distribution of samples that can carry a signal smaller than 1 lsb without spectral distortion.

Post

Fender19 wrote: Fri Sep 18, 2020 5:39 pmSo the solution is to truncate in the plugin and not leave it up to the DAW?
First, I'd say it's safest to, in that regard, but I also think you should truncate in your plugin regardless of whether it's needed. Otherwise, you can't monitor the result until you print it. If you're dithering your output, it's probably because you believe it improves the result—wouldn't you want to hear that result to be sure? I know you said you thought most dither plugins don't do the truncation, but I doubt that's true.
2) the peak level of a 16bit wave file in the positive excursion is 32767 so why are we scaling the dither signal by 1<<15 (32768) and not 1<<15 - 1 (32767) ?
Not sure what you're asking on #1, so I'll skip to #2: It's a tiny deal, for sure, but no, you want the straight shift. Remember, the highest integer value is 2^(bits-1)-1 (32767), but that's something we live with—we'd love for it to be symmetrical, but it's not worth wasting another (partial) bit over. Yet, if you want to scale the output by one-half, you'd >>1, not >>1-1. Don't go down the path of thinking the shortcoming of one level missing on the high end needs to be replicated for other things that aren't subject to that shortcoming. The dither level doesn't care that the max positive value has that limitation. ;-)
My audio DSP blog: earlevel.com

Post

earlevel wrote: Fri Sep 18, 2020 10:28 pm Not sure what you're asking on #1
What I am asking is how a signal that is mostly BELOW the lsb of 16-bit audio is being captured in enough detail in a 16-bit export from Reaper and Cubase that it sounds like noise - and not "speckley" static of only random lsbs from the signal peaks. I have checked and neither DAW is applying dither of it's own.

In Wavelab and Pro Tools that export DOES sound like speckely static and I need to raise the dither level by 2x (6dB) to get the same dithering effect.

So there is, apparently, a difference in how various DAW's perform a 32->16 bit reduction - as kryptonaut suggests. They all do not simply truncate, it seems. Otherwise, :ud:

Post

Fender19 wrote: Fri Sep 18, 2020 11:24 pm What I am asking is how a signal that is mostly BELOW the lsb of 16-bit audio is being captured in enough detail in a 16-bit export from Reaper and Cubase that it sounds like noise - and not "speckley" static of only random lsbs from the signal peaks.
It is because they are doing what they should be doing and rounding to nearest. This means that the breakpoint for crossing out of zero is at +/-0.5lsb.

Any host that does something else is fubar.

The scaling isn't standardized anywhere though. It's usually 0x7fff, sometimes 0x8000, but really it could be anything that the host developer thought was a good idea. It doesn't even need to be integer, it could be 32768.86723 just as well, even if it probably isn't. If you want to reliably dither in an arbitrary host, you should really measure them all to get data on exactly where each one of the quantization steps starts and ends, then store the data in the plugin so that the plugin can figure out how to convince it's current host to output the correct bit-pattern.

Post

mystran wrote: Wed Sep 16, 2020 2:38 amnormalized TPDF in range [-1,+1] which is then scaled down by 15 bits, resulting in a 2 lsb spread.
I think this may be the issue/confusion right here. Your “2 lsb spread” is a peak-to-peak amplitude. I believe the commonly stated “dither at 2 lsb” refers to rectified amplitude, not peak-to-peak. In other words, the scale factor should be 1<<14 not 1<<15 which is what the dither in Pro Tools measures (-84dBFS peak).

1<<15 probably works in FL, Reaper and Cubase because, as you suggest, they are rounding 1/2 lsb.

So, my conclusion is to use 1<<14 (-84dBFS) peak level noise and it should work in all DAWs. No need to measure every one.

Post

Fender19 wrote: Sat Sep 19, 2020 5:53 am
mystran wrote: Wed Sep 16, 2020 2:38 amnormalized TPDF in range [-1,+1] which is then scaled down by 15 bits, resulting in a 2 lsb spread.
I think this may be the issue/confusion right here. Your “2 lsb spread” is a peak-to-peak amplitude. I believe the commonly stated “dither at 2 lsb” refers to rectified amplitude, not peak-to-peak.
At this point I suggest it might be helpful for you to actually take a more formal look at the theory, see for example http://www.robertwannamaker.com/writings/rw_phd.pdf

Post

mystran wrote: Sat Sep 19, 2020 2:39 amThe scaling isn't standardized anywhere though. It's usually 0x7fff, sometimes 0x8000, but really it could be anything that the host developer thought was a good idea.
I think I remember reading somewhere that it actually is, but I don't remember what the standard factor was, nor do I really care ATM ;) I won't be surprised tho, if this standard is not really being followed.

Post

Z1202 wrote: Sat Sep 19, 2020 10:42 am
mystran wrote: Sat Sep 19, 2020 2:39 amThe scaling isn't standardized anywhere though. It's usually 0x7fff, sometimes 0x8000, but really it could be anything that the host developer thought was a good idea.
I think I remember reading somewhere that it actually is, but I don't remember what the standard factor was, nor do I really care ATM ;) I won't be surprised tho, if this standard is not really being followed.
The question is, what standard would even be applicable? I have at various occasions tried to research the subject, but so far I have not been able to find anything beyond pure personal opinions and I would love to be pointed towards anything more formal.

On the subject of rounding: https://www.cs.cmu.edu/~rbd/papers/cmj- ... o-int.html

Post

Looking at this once more, recent post viewtopic.php?p=7825566#p7825566 gives AES17-1998 and IEC 61606-3 as references. I haven't seen these referred to anywhere else, but apparently someone knew about them. These are specifically audio measurement standards, so I suppose that makes them applicable.

Both of them are very explicit that 0dBfs is the maximum positive value and in 2's complement the maximum negative value is unused. The latter even spells out the bit-patterns to make sure. On the face value (ie. assuming "just reaches" can be interpreted as reaching the mid-point of the quantization interval, rather than the level-threshold), this implies that one should scale by 0x7fff for 16-bit when [-1,+1] is taken as the floating-point full-scale.

So there you go: scaling by 0x8000 is wrong and any software that acts in such a way is FUBAR.

Post

mystran wrote: Sat Sep 19, 2020 1:17 pmBoth of them are very explicit that 0dBfs is the maximum positive value and in 2's complement the maximum negative value is unused. The latter even spells out the bit-patterns to make sure. On the face value (ie. assuming "just reaches" can be interpreted as reaching the mid-point of the quantization interval, rather than the level-threshold), this implies that one should scale by 0x7fff for 16-bit when [-1,+1] is taken as the floating-point full-scale.

So there you go: scaling by 0x8000 is wrong and any software that acts in such a way is FUBAR.
Certainly a valid idea, and I want to be clear I'm not arguing with "you" on this—or whether it's the accepted practice—but to why I think the reasoning for people doing it amounts to a solution for a shortcoming that is not a problem.

First, we're talking about audio here. There is virtually no reason to have an audio signal run the full [-1,+1] allowed, to the DAC. In fact it would almost certainly be in error anyway (in general, any signal you generate with that range will exceed that range in the analog domain due to intersample overs). There is no significant downside, for audio purposes, of the lack of "+1.0" being available. Certainly no downside that's made better by deeming -1.0 "unused". We can't use +1.1 or -1.1 either, even though we can compute them, they aren't available for output—so we don't use them.

Just my opinion, so if someone points me to some standard or accepted practices docs, I don't care. :wink:
Last edited by earlevel on Sat Sep 19, 2020 6:31 pm, edited 1 time in total.
My audio DSP blog: earlevel.com

Post

*oops*
My audio DSP blog: earlevel.com

Post

mystran wrote: Sat Sep 19, 2020 1:17 pm So there you go: scaling by 0x8000 is wrong and any software that acts in such a way is FUBAR.
Another part of the confusion (at least for me) is that 16bit audio is in two's complement format, not signed bit format. So for example, when converting a normalized float to a two's complement int you don't just do this:

Code: Select all

integernumber = (1<<15) * normalizedfloatnumber;
The correct conversion is:

Code: Select all

integernumber = (1<<15) * floatnumber;
if (floatnumber < 0.) integernumber -= 1;
"Zero" is part of the positive range of the int.

I almost never work with integers so that second line of code was news to me. Maybe some DAWs don't get this right either in their 32float->16bit int exports?

Makes my head hurt... :?

Post

Fender19 wrote: Sat Sep 19, 2020 6:40 pm Another part of the confusion (at least for me) is that 16bit audio is in two's complement format, not signed bit format. So for example, when converting a normalized float to a two's complement int you don't just do this:

Code: Select all

integernumber = (1<<15) * normalizedfloatnumber;
The correct conversion is:

Code: Select all

integernumber = (1<<15) * floatnumber;
if (floatnumber < 0.) integernumber -= 1;
"Zero" is part of the positive range of the int.
No. Just get used the the fact that the available range of your DAC (or integer sample file) is asymmetrical. It doesn't mean your audio has to be! The "(1<<15) *" is simply scaling to the integer range, which you probably aren't using anyway. There is no need to offset or stretch anything beyond that.

Thank of it this way. The output of your process may produce 1.0. But it could also 1.1 (maybe its a parametric filter plugin). If it does, you'd have no problem clipping it to 32767 to make a 16-bit DAC happy. So why the anguish over doing the same with 1.0? Way too much worry about 1.0 not existing in a DAC.
My audio DSP blog: earlevel.com

Post Reply

Return to “DSP and Plugin Development”