## When are double-precision samples needed?

KVRist

30 posts since 22 Dec, 2010
aciddose wrote:actually, that is true.

Hard to know what you're referring to without a quote. I hope you're not referring to my last sentence (if so, you didn't get my point; if thermal noise is a problem—and it already dominates waaaayyy before you get out to 64 bits—adding addition bits does nothing to help in that regard, so saying it might not be enough makes no sense).
DaveHoskins
KVRian

616 posts since 7 Jan, 2009, from Gloucestershire
thevinn wrote:Under what circumstances is it important to represent and process samples double precision floating point representation (versus single precision)?

Are you going to record the samples? If so, with what? The noise floor certainly drops off as you get right up on absolute zero, but I'm just not sure how you're going to manage that recording session...

So, are you going to calculate the samples? OK, you can do that, but what are you going to play them back on?

Single precision floating point gives you 25-bit of precision (23-bit mantissa, one implied bit due to normalization, and a sign bit). That's 150.5 dB. If your output is 1 V p-p, the least significant bit would be, what, less than a tenth of a microvolt? What do you think the background (thermal) noise of your gear is?

Sure, there are reasons to do double precision math (high-order IIR filters, where small errors in coefficients and calculations in the feedback can become large, for instance). There's always a way around needing double precision, but the bottom line is that double precision is usually next to free (most processors already work in double precision), though it can be costly if you're going to store everything in double-precision.

Plus, it sound really good to some people to hear, "The audio signal path is 64-bit from input to output, for maximum resolution and headroom", "...delivers the sonic purity and headroom of a true 64-bit, floating-point digital audio path", "...'s 64-bit asynchronous audio engine with industry leading audio fidelity, 64-bit audio effects and synths, the forthcoming...which provides native 64-bit ReWire integration, allow for an end-to-end 64-bit audio path". Yeah, I did a search and pulled those marketing quotes from real products. What happens when a software maker starts touting extended precision 80-bit floating point (not nearly-free)? Will everyone rush to 80-bit?

Plus, I'm sure that the issue of a workstation's (and plug-ins') support of 64-bit (memory addressing) operating system adds a layer of confusion to some people.

BTW, this topic made me look up some references, such as an article explaining why 64-bit is a good idea. All of the arguments I've seen use flawed logic, showing a misunderstand of the math—such as explaining the range of 64 bits as 2 to the 24th power, and saying it helps guard again overflow (hello, it's floating point...). Another said that 64-bit floating point might not even be enough for a clean sound, because there's the problem of thermal noise (!!!).

I've already mentioned the power of suggestion, although I took the leap to 128 bits before you, so I guess my plug will sound better, as 'more is more', right?
The only time I've noticed a problem is with IIRs and oscillators that use 2D rotation techniques. But I do wonder what the overall problems are if, say, a daw is playing 128 tracks added up, that all have effects added in. Wouldn't all those additions and multiplications have accumulated problems?

.
jkleban
KVRian

632 posts since 4 Mar, 2007
Not sure if this applies but when you mathematically change NOTES of sampled instruments, lowering the pitch of said samples the smoothness of the samples would diminish for in essence the lowered pitched is changing the playback sample rate... so the higher the bit rate of the sample the better the pitch shifted samples, no?

I do admit that I really don't know what I am talking about but the theory above does have some logic behind it.

Jim
The keeper of the Shrine.
http://lldom.blogspot.com
The Lamb Laid Down on MIDI
DaveHoskins
KVRian

616 posts since 7 Jan, 2009, from Gloucestershire
jkleban wrote:Not sure if this applies but when you mathematically change NOTES of sampled instruments, lowering the pitch of said samples the smoothness of the samples would diminish for in essence the lowered pitched is changing the playback sample rate... so the higher the bit rate of the sample the better the pitch shifted samples, no?

I do admit that I really don't know what I am talking about but the theory above does have some logic behind it.

Jim

When a sample is played back at a slower rate, it is stepped though using interpolation. Look up 'audio interpolation' on the web for tons of info.
JonHodgson
KVRian

770 posts since 1 Apr, 2003
DaveHoskins wrote:I've already mentioned the power of suggestion, although I took the leap to 128 bits before you, so I guess my plug will sound better, as 'more is more', right? .

You're both behind the curve.

My plugins are going to be 127 bits from now on, because you have to use a prime number of bits if you want to avoid audible rasonances in the maths.

thevinn
KVRian

775 posts since 30 Nov, 2008
Okay so just to be clear, the consensus is that storing sample arrays using double precision floating point representation provides no tangible benefit over single precision?

The corollary is that intermediate calculations should be done in double precision: for example, filter coefficients.
antto
KVRAF

2495 posts since 4 Sep, 2006, from 127.0.0.1
yup, for storage (of audio signals) 32bit float is enough
i even use 16bit (truncated mantissa) float in some situations, to save space..
It doesn't matter how it sounds..
..as long as it has BASS and it's LOUD!

irc.freenode.net >>> #kvr
andy-cytomic
KVRAF

2080 posts since 3 Dec, 2008
antto wrote:it might make sense in a modular host i guess
..if you're able to make feedback loops between the plugins..

No, note even then. It is only things like the intermediate internal results of additions, subtractions, and accumulations (and then further values that result from these as an input) that need double precision in certain situations.
The Glue, The Drop - www.cytomic.com
andy-cytomic
KVRAF

2080 posts since 3 Dec, 2008
thevinn wrote:Okay so just to be clear, the consensus is that storing sample arrays using double precision floating point representation provides no tangible benefit over single precision?

The corollary is that intermediate calculations should be done in double precision: for example, filter coefficients.

Yes to the first bit, but for the second bit you only need double precision for certain filter structures under certain circumstances. I recommend to never use a direct form 1 or 2 biquad ever, period, but if you must then you really do need double precision for all coefficients and memory since the structure is so bad it adds significant quantization error to your signal, as well as the cutoff being nowhere near where you want it. Check out the plots I've done here that compare different digital linear filter topologies: www.cytomic.com/technical-papers
The Glue, The Drop - www.cytomic.com
Jesse J
KVRist

323 posts since 2 Oct, 2002, from Finland, Europe
Do some advanced wavefolding and/or FM and you'll probably benefit from the added precision a lot.
andy-cytomic
KVRAF

2080 posts since 3 Dec, 2008
Thanks for you post Codehead, I agree with everything you have posted, but just want to make a clarification on the point below specifically to do with the word "usually":

codehead wrote: There's always a way around needing double precision, but the bottom line is that double precision is usually next to free (most processors already work in double precision), though it can be costly if you're going to store everything in double-precision.

Vector instructions on Intel chips can process twice as many floats as doubles. This is similar on Texas Instruments floating point DSP chips, you can process twice as many floats as doubles, but it is a very different architecture. If you are calculating scalar results on an intel chip, and memory / cache issues don't come into it, then yes a float op takes the same time as a double op, and you just throw away the other 3 float ops you could have used.

In my experience you get around x2 to x3 speedup from writing vectorized float sse code that computes 4 floats at once as there is usually some data giggling and other overheads involved in using sse.
The Glue, The Drop - www.cytomic.com
KVRist

30 posts since 22 Dec, 2010
DaveHoskins wrote:But I do wonder what the overall problems are if, say, a daw is playing 128 tracks added up, that all have effects added in. Wouldn't all those additions and multiplications have accumulated problems?
.

Yes they do, but note that while the summed error from the digits at the least significant bits grows, so does the summed signal with the more significant non-error bits. That is, if you add two similar values, you have to allow for the possibility that the error doubles, essentially moving to the left one bit. But you have to allow for the total result to grow one bit too. So in the completely general case, you'd then divide the result by two, so that your signal doesn't grow, and you've found that now the error didn't grow either.

Extend that to 128 tracks—your error might grow 128x, but so does your signal, so you didn't change the signal to noise ratio.
KVRist

30 posts since 22 Dec, 2010
andy_cytomic wrote:Vector instructions on Intel chips can process twice as many floats as doubles.

Excellent point—yes, I was considering normal CPUs (not DSPs), but I forgot about vector processing. I guess the bottom line is that if a processing unit is built to handle only one floating point operation at a time, double will execute as fast as single; but if it's built to do multiple parallel operations, it's going to optimize real estate...
aciddose
KVRAF

11810 posts since 7 Dec, 2004
aciddose wrote:actually, that is true.

Hard to know what you're referring to without a quote. I hope you're not referring to my last sentence (if so, you didn't get my point; if thermal noise is a problem—and it already dominates waaaayyy before you get out to 64 bits—adding addition bits does nothing to help in that regard, so saying it might not be enough makes no sense).

because of noise, you get dither. when you average dither the number of bits increases by one every time you filter away 6db of noise.

that means we can for example input a signal to a 1-bit ADC (a compare > 0) and mix it with high-frequency noise.

by filtering away this noise with a low-pass filter, every time we decrease the noise level by half we gain one bit.

let's say we use 40khz sample rate. we want at least 1khz frequency for our input.

to get 25-bits, we need a filter with a cutoff of 1khz we need a filter with at least 30db/o slope. easy to achieve.

also, yes float does have "virtually" 25 bits, but it isn't because of any phantom bit. it's because we take away half the range of precision from the exponent and apply that to the mantissa by normalizing the mantissa to a half range. (0.5 - 1.0, not 0.0 to 1.0.)

so yes it's true to say that float has at least 25 bits accuracy in a normalized range like 0.0 - 1.0, but it isn't smart to say this accuracy comes from magic; which is what "normalization and implied bit" tends to sound like. easier to just describe it as having 23 bits that apply to half as much range.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
KVRist

30 posts since 22 Dec, 2010
thevinn wrote:Okay so just to be clear, the consensus is that storing sample arrays using double precision floating point representation provides no tangible benefit over single precision?

The corollary is that intermediate calculations should be done in double precision: for example, filter coefficients.

Right. Now, to elaborate on filters a little: (First, don't read that as saying this is the only case you need that extra precision, by any means—I'm just address why you'd need the extra precision in part of the math, but not necessarily the entire audio path.) First, it's the feedback path that's the issue. You're multiplying an output sample by a coefficient and feeding it back to get summed with the input (which produces a new output, which gets multiplied by the coefficient again, which feeds back to the input...).

So, you have an audio sample and a coefficient. The audio sample by itself is suitable for playback in single precision. The coefficient may or may not be adequate at single precision—it depends on the filter type, order, filter setting relative to the sample rate. For instance, a direct-form IIR requires more and more digits of precision as you more a pole to a frequency that a smaller relative to the sample rate. Put another way, if you use, say, 8 bits to the right of the decimal, as you count down through all the possibly values for that coefficient, the corresponding pole positions (where the feedback occurs in frequency) spread out more and more. So, you can get in situation where a pole isn't really where you specified it—it's been quantized to a less desirable position, maybe even on the unit circle yielding stability problems. Higher order filters will have more problems, because the poles won't be in the correct spots relative to each other. And at higher sample rates, the problems gets worse, because setting a filter to 100 Hz at 192k is much worse than setting to 100 Hz at 44.1k.

For the coefficients, you could just go with double precision—problem solved. But it's not the only thing you can do—there are other filter forms that are equivalent, but either have a more homogenous quantization errors (the coefficient spacing is the same everywhere), or have the opposite sensitivity (higher density at the low end, worse at the high end—but that's a good tradeoff if it's a low-end filter).

The other part is the math and the feedback. Part one is: When you multiply two numbers, you end up with more bits. If you multiply two floats, it should take a double to hold the result. The fact that in modern hardware float * float = float means that the precision is getting truncated—that's error. The higher the IIR order, the more sensitive to that error. You can go all-double in your calculation, or you can do some other tricks where you essentially noise-shape that error by saving and feeding it back in with another filter. Part two has to do with adding: if you add a very small float to a large one, the small one disappears because there's aren't enough mantissa bits (again that can be fixed with a better filter form).

I'm writing a book here, sorry—the bottom line is that you can either twiddle your architecture to be less error-sesitive, or just up the precision with doubles. But, you can just keep that inside the filter, and pass on the output sample as a float.
PreviousNext

Moderator: Moderators (Main)