KVR Audio

fese · Post by **fese** » Tue May 10, 2022 6:53 pm

Recently I was wondering how and when conversion between types is happening as the data flows through a DAW and to an audio interface.
Say I have recorded a file in 24bit int. Now when that gets passed through the mixer and plugins, those must somehow get converted to float values, and im the end, when passing the data to the audio interface, back to integer. Where does that happen? In the DAW, or the audio driver?
And how exactly does it work? If I remember correctly, values in audio usually range between +1.0 and -1.0f. How does that converted to a number between +2**23 and -2**23? How do 24bit values get stored inside the software anyway? I don’t know of any data type that is three byte long, at least not from my very limited knowledge of C.
Just curious. Maybe some of the clever people here can enlighten me.

Music Engineer · Post by **Music Engineer** » Tue May 10, 2022 9:03 pm

I had a similar question semi-recently:

viewtopic.php?p=8147097

FigBug · Post by **FigBug** » Tue May 10, 2022 10:04 pm

I think most DAWs use floats internally, which are 4 byte numbers. Even though they are 32 bits in memory, they only have 23 bits of precision since they are stored as base and exponent, and the exponent doesn't contribute to the precision. Some DAWs use doubles internally, which are 64 bits and can have up to 53 bits precision. Originally the plugin APIs like VST were float only, later they got double extensions, but many plugins still don't support it.

So when your DAW loads your wave file, it converts it to float, passes it through your plugins as float, mixes it as float and then passes it to your audio driver. Some of the older OS audio APIs took 16 bit ints. Others are more configurable. Once the OS gets the data in whatever format it wants, it will get converted into the proper format for the hardware.

A lot of DAWs are moving to doubles for their signal path now, if you can hear the difference is up for debate.

fese · Post by **fese** » Wed May 11, 2022 8:08 am

Music Engineer wrote: ↑Tue May 10, 2022 9:03 pm I had a similar question semi-recently:

viewtopic.php?p=8147097

Thank you, yes, very interesting! So it is actually possible to convert between float and (24bit) int without any truncation happening?
Background: in another forum there was a discussion whether to dither from float to 24bit int, and as usual with that topic, there are a lot of uninformed options (including mine) and no facts.

mystran · Post by **mystran** » Wed May 11, 2022 9:51 am

fese wrote: ↑Wed May 11, 2022 8:08 am Background: in another forum there was a discussion whether to dither from float to 24bit int, and as usual with that topic, there are a lot of uninformed options (including mine) and no facts.

You should always dither using TPDF (well, unless you're just doing round trip to floats and back with no processing, in which case dithering obviously won't do anything useful). You can optionally also apply noise shaping, but that comes with it's own set of tradeoffs. The only question is whether the quantization noise at 24bit is such that it'll matter in practice, but from a theoretical point of view dithering is the correct thing to do and it will absolutely never hurt to do so.

ps. The 2lsb TPDF need not necessarily be white, so for example generating 1lsb RPDF and then taking the difference between the current and previous random value produces "blue" 2lsb TPDF where there is slightly more energy at high frequencies and none at DC, which often works great for audio and avoids generally the problems with actual noise shaping.

pps. Don't forget that while at full amplitude that 32-bit floating point number might not have enough precision in mantissa for dither to make any difference, in practice as soon as the sample values are not near the maximum values there is plenty of additional precision and dithering becomes useful... and it's specifically at quiet parts of the signal where the potential problems with not dithering are also the worst.

ratchov · Post by **ratchov** » Fri May 13, 2022 9:00 am

fese wrote: ↑Wed May 11, 2022 8:08 am Thank you, yes, very interesting! So it is actually possible to convert between float and (24bit) int without any truncation happening?

To store samples in the [-1:1] range, both representations have the same precision and the resulting noise is at most -144dB.

Any 24-bit integer (actually fixed-point number covering samples in the [-1:1] range) has an exact float representation. In other words, it may be converted to float and back to integer with no truncation.

On the other hand, there are floating point numbers in the [-1:1] interval that have no integer representation: samples very close to 0 would be truncated. But such samples are below the noise threshold (-144dB). Dithering improves integer representation of such small samples at the expense of adding noise (slightly reduce overall precision).

mystran · Post by **mystran** » Fri May 13, 2022 9:45 am

ratchov wrote: ↑Fri May 13, 2022 9:00 am On the other hand, there are floating point numbers in the [-1:1] interval that have no integer representation: samples very close to 0 would be truncated. But such samples are below the noise threshold (-144dB). Dithering improves integer representation of such small samples at the expense of adding noise (slightly reduce overall precision).

This is just not quite right. Floating point gains one bit of precision in mantissa every time you halve the magnitude. Half magnitude is about -6dB gain and one bit is about 6dB of dynamic range. This means that while integers have fixed dynamic range with respect to full-scale, floating point has more or less fixed dynamic range with respect to the sample amplitude.

So you absolutely do not need the sample to be "very close to 0" for (even single-precision) floating point to have a ton more precision than a 24-bit integer. In fact, unless you're pushing your signal with a maximizer into glorified square waves, just about all of your samples are likely to have significantly more precision in float.

Finally, saying that "dither slightly reduces overall precision" is absolutely backwards. What dithering specifically does is increases precision (at least in frequency domain which is what matters) at the cost of slight noise. The somewhat unintuitive thing about dither is that the additive noise floor will be at lower level than where your quantization artifacts would have been (by something like ~20dB so it's not even subtle). So even if we treat the noise as "error" we're still gaining precision and this is why dither is the correct thing to do.

hugoderwolf · Post by **hugoderwolf** » Sat May 14, 2022 9:15 pm

The best way to think about dither is that it kinda distributes the energy of quantization errors evenly across the frequency spectrum (or unevenly with noiseshaping), thus reducing peaks in the error spectrum.

dmbaer · Post by **dmbaer** » Sun May 15, 2022 9:33 pm

mystran wrote: ↑Fri May 13, 2022 9:45 am So even if we treat the noise as "error" we're still gaining precision and this is why dither is the correct thing to do.

And yet, audio expert Ethan Winer has for years now had a challenge posted on his web site in which an audio track is partly dithered and partly untouched by dithering. The challenge is for anybody to tell him what part is dithered and what is not. As far as I know, there still have been no right answers.

Mr. Winer would be the first to say that dithering does no harm. But he's convinced me, at least, that in the end, people simply cannot tell the difference if it's used or not, except perhaps in the most unusual, unrealistic, circumstances.

quikquak · Post by **quikquak** » Mon May 16, 2022 12:36 pm

With graphics, it’s easy to see how dithering works well: See here:
https://www.shadertoy.com/view/4t23RW
With audio it’s a temporal blur. So the details from missing data are distributed across many samples.
Personally, I think it changes the perceived sound, but I’m yet to be convinced it’s better. I suppose it must be, because of the stats, right?

mystran · Post by **mystran** » Mon May 16, 2022 4:34 pm

quikquak wrote: ↑Mon May 16, 2022 12:36 pm With graphics, it’s easy to see how dithering works well: See here:
https://www.shadertoy.com/view/4t23RW
With audio it’s a temporal blur. So the details from missing data are distributed across many samples.
Personally, I think it changes the perceived sound, but I’m yet to be convinced it’s better. I suppose it must be, because of the stats, right?

I've actually spent some time exploring 2D dithers for images too. For images, straight white TPDF is less than ideal though, because low-frequency noise tends to be a lot more visible that higher frequency noise (which arguably is true for audio too, but it's not quite so bad)... so it makes sense to try and generate something with a bluish spectrum.

Here's a (rather expensive) hash-based dither that builds a TPDF with (slightly non-linear) filtering of the 4 axial neightbours: https://www.shadertoy.com/view/wlccRX

That strategy can actually be used to generate bluish RPDF too for things like stochastic depth sampling (eg. shading volumetrics and the like; same problem, want to get rid of low-frequencies)... but it's a bit expensive to compute (5 hash functions per pixel), so here's an alternative strategy that combines straight TPDF with a 2x2 ordered dither: https://www.shadertoy.com/view/sdd3DM

quikquak · Post by **quikquak** » Mon May 16, 2022 6:11 pm

Very nice. I’ve not tried blue noise on audio myself yet. Do you find it works well for audio, or is it just less obvious or effective than white noise?

mystran · Post by **mystran** » Mon May 16, 2022 6:19 pm

quikquak wrote: ↑Mon May 16, 2022 6:11 pm Very nice. I’ve not tried blue noise on audio myself yet. Do you find it works well for audio, or is it just less obvious or effective than white noise?

Well, I like the "generate RPDF, differentiate for TPDF" approach as it puts a bit more noise at high frequences and a bit less at low frequencies without being drastically different from white noise and it's stupidly cheap to do and works as well as white TPDF (in both cases assuming somewhat decent PRNG) without any of the ugly stuff you get with noise shaping (ie. it's still just 2lsb peak).

Conversion between float and integer in a DAW