thevinn wrote:Okay so just to be clear, the consensus is that storing sample arrays using double precision floating point representation provides no tangible benefit over single precision?

The corollary is that intermediate calculations should be done in double precision: for example, filter coefficients.

Right. Now, to elaborate on filters a little: (First, don't read that as saying this is the only case you need that extra precision, by any means—I'm just address why you'd need the extra precision in part of the math, but not necessarily the entire audio path.) First, it's the feedback path that's the issue. You're multiplying an output sample by a coefficient and feeding it back to get summed with the input (which produces a new output, which gets multiplied by the coefficient again, which feeds back to the input...).

So, you have an audio sample and a coefficient. The audio sample by itself is suitable for playback in single precision. The coefficient may or may not be adequate at single precision—it depends on the filter type, order, filter setting relative to the sample rate. For instance, a direct-form IIR requires more and more digits of precision as you more a pole to a frequency that a smaller relative to the sample rate. Put another way, if you use, say, 8 bits to the right of the decimal, as you count down through all the possibly values for that coefficient, the corresponding pole positions (where the feedback occurs in frequency) spread out more and more. So, you can get in situation where a pole isn't really where you specified it—it's been quantized to a less desirable position, maybe even on the unit circle yielding stability problems. Higher order filters will have more problems, because the poles won't be in the correct spots relative to each other. And at higher sample rates, the problems gets worse, because setting a filter to 100 Hz at 192k is much worse than setting to 100 Hz at 44.1k.

For the coefficients, you could just go with double precision—problem solved. But it's not the only thing you can do—there are other filter forms that are equivalent, but either have a more homogenous quantization errors (the coefficient spacing is the same everywhere), or have the opposite sensitivity (higher density at the low end, worse at the high end—but that's a good tradeoff if it's a low-end filter).

The other part is the math and the feedback. Part one is: When you multiply two numbers, you end up with more bits. If you multiply two floats, it should take a double to hold the result. The fact that in modern hardware float * float = float means that the precision is getting truncated—that's error. The higher the IIR order, the more sensitive to that error. You can go all-double in your calculation, or you can do some other tricks where you essentially noise-shape that error by saving and feeding it back in with another filter. Part two has to do with adding: if you add a very small float to a large one, the small one disappears because there's aren't enough mantissa bits (again that can be fixed with a better filter form).

I'm writing a book here, sorry—the bottom line is that you can either twiddle your architecture to be less error-sesitive, or just up the precision with doubles. But, you can just keep that inside the filter, and pass on the output sample as a float.