When are double-precision samples needed?

DSP, Plug-in and Host development discussion.
codehead
KVRist
30 posts since 22 Dec, 2010

Post Sun Feb 19, 2012 11:50 am

aciddose wrote:
codehead wrote:
aciddose wrote:actually, that is true.
Hard to know what you're referring to without a quote. I hope you're not referring to my last sentence (if so, you didn't get my point; if thermal noise is a problem—and it already dominates waaaayyy before you get out to 64 bits—adding addition bits does nothing to help in that regard, so saying it might not be enough makes no sense).
because of noise, you get dither. when you average dither the number of bits increases by one every time you filter away 6db of noise.

that means we can for example input a signal to a 1-bit ADC (a compare > 0) and mix it with high-frequency noise.

by filtering away this noise with a low-pass filter, every time we decrease the noise level by half we gain one bit.

let's say we use 40khz sample rate. we want at least 1khz frequency for our input.

to get 25-bits, we need a filter with a cutoff of 1khz we need a filter with at least 30db/o slope. easy to achieve.

also, yes float does have "virtually" 25 bits, but it isn't because of any phantom bit. it's because we take away half the range of precision from the exponent and apply that to the mantissa by normalizing the mantissa to a half range. (0.5 - 1.0, not 0.0 to 1.0.)

so yes it's true to say that float has at least 25 bits accuracy in a normalized range like 0.0 - 1.0, but it isn't smart to say this accuracy comes from magic; which is what "normalization and implied bit" tends to sound like. easier to just describe it as having 23 bits that apply to half as much range.
Acid—I'm sorry, I'm at a total loss of what it is you're disagreeing with me about. I didn't know which point of my long post (or if it was me you were addressing) when you posted only "actually, that is true." Now you've posted a bunch of other comments that don't seem to disagree with me (or do they?) about any point in particular, though it's clear you're addressing me this time.

Sorry for being dense, but is there a point that you think I've said something wrong? Could you quote the part of my post that applies? I didn't say anything about a "phantom bit", for instance—I said a bit was gained from normalization, same as you said (though it's not 0.5-1.0, as you said—it's 1.0—1.99999..., but that's a trivial detail, since you could redefine the exponent offset and say it's 0.5-0.99999...—though not 1.0).

codehead
KVRist
30 posts since 22 Dec, 2010

Post Sun Feb 19, 2012 12:08 pm

andy_cytomic wrote:
thevinn wrote:Okay so just to be clear, the consensus is that storing sample arrays using double precision floating point representation provides no tangible benefit over single precision?

The corollary is that intermediate calculations should be done in double precision: for example, filter coefficients.
Yes to the first bit, but for the second bit you only need double precision for certain filter structures under certain circumstances. I recommend to never use a direct form 1 or 2 biquad ever, period, but if you must then you really do need double precision for all coefficients and memory since the structure is so bad it adds significant quantization error to your signal, as well as the cutoff being nowhere near where you want it. Check out the plots I've done here that compare different digital linear filter topologies: www.cytomic.com/technical-papers
Andy, I agree, mostly—if a person doesn't understand the pitfalls, they should avoid using direct form, or more likely just use double precision (I say more likely, because if they aren't experienced enough to understand the pitfall to begin with, they probably would be gravitating towards readily available material such as rbj's IIR formulas or some other semi-canned solution—that's why I say "mostly").

That said, I'll add that, for the direct forms, use direct form I on a fixed-point processor (single accumulation point—fixed point DSPs have added headroom in the acc), and direct form II transposed for floating point (better numerical characteristics than non-transposed). And I can tell you that 24-bit integer math is *not* sufficient for a direct form I at lower audio frequencies (and gets worse as you support higher sample rates). I know for a fact that people have implemented these in pro plug-ins, without examining what the filter is really doing, and I'm sure that many plug-in developers have added higher sample rate support without suspecting that their filters have gone to hell.

First order noise shaping is pretty cheap though, and usually does the trick, with second order only a little more expensive.

Edit: Oh, I should add the obvious—direct form filters suck for modulating at audio rates and such, due to lack of parameterized Q and frequency, so I'm talking about using them as utility filters within algorithms, or as simple fixed audio filters on their own. Don't even think about trying to fudge these things into being synth filters.

User avatar
aciddose
KVRAF
12289 posts since 7 Dec, 2004

Post Sun Feb 19, 2012 12:45 pm

codehead wrote:...
it's regarding where you said that you saw someone saying because of thermal noise even double wouldn't be enough. that's true.

i'm not disagreeing with you since you never mentioned why you thought it wasn't true. you only said you didn't believe it.

in the case i describe, it's true. when you filter that noise to get a particular component you can increase the bits with steeper filters. it's trivial to get float accuracy from a one-bit signal for a value below 1khz with a 48khz (or anything near it) sampling rate.

so assuming you want to use your pc to handle very accurate low-frequency measurements taken from low-drift sources, doubles might not be enough. even long-doubles might not, but then we're starting to get into territory where it's hard to come up with an example that would actually need that much accuracy even before normalization.

if you re-normalize every step and apply your filter in stages you could even get away with float.

i was just trying to give an example of where someone might have actually been making a perfectly valid point. it's often that someone will repeat this and we get into chinese whispers to the point where it's very difficult to understand what the original point was.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.

codehead
KVRist
30 posts since 22 Dec, 2010

Post Sun Feb 19, 2012 2:08 pm

aciddose wrote:
codehead wrote:...
it's regarding where you said that you saw someone saying because of thermal noise even double wouldn't be enough. that's true.
OK, I guess you didn't get what I was saying, or missed it in the follow-up when I explained, so I'll put it another way...

If you've got an 8-bit sample system, thermal noise is not an issue—other errors dominate.

If you've got a 32-bit float system, you are definitely at a point where thermal noise on playback is an issue, although other errors may still dominate, depending on the processing you're doing.

If you've got a 64-bit float system, thermal noise dominates. And certainly the thermal noise didn't get worse—it's just that it's more likely to be the limiting factor because other errors got better.

So, to say "64-bit might not be enough to get a clean sound—because of thermal noise, you might need even more than that" would be ludicrous. More bits won't get rid of thermal noise. Thermal noise is the same in all of these examples—the only thing that changes is how much of a limiting factor it is. To improve it, you buy better converters and amplifiers—the number of bits has nothing to do with it.

I know I'm not telling you anything you don't already know, but maybe you understand my comment now.
if you re-normalize every step and apply your filter in stages you could even get away with float.
Not understanding this comment—floats "re-normalize" themselves (normally ;-)). Instead, you need to manage the error (by retaining and processing it, but using doubles at critical points, and/or by managing the algorithms you're using in order to minimize error).

User avatar
aciddose
KVRAF
12289 posts since 7 Dec, 2004

Post Sun Feb 19, 2012 3:14 pm

codehead wrote:So, to say "64-bit might not be enough to get a clean sound—because of thermal noise, you might need even more than that" would be ludicrous. More bits won't get rid of thermal noise. Thermal noise is the same in all of these examples—the only thing that changes is how much of a limiting factor it is. To improve it, you buy better converters and amplifiers—the number of bits has nothing to do with it.
i always understood your point. you might still not understand mine. what i explained was that actually you only need to use a 1-bit convertor and using dithering and a filter you can get an infinite number of bits of accuracy measuring a value from this convertor given an infinite amount of time.

we can easily within 1ms produce float's range of accuracy when we sample at 40khz.

like i said, chinese wispers. people hear a description like this saying you might need to store more than 25-bits of a value, or even more than 50 bits, or 256 bits, or any number. they don't understand the reason, but they go on to repeat portions of the justification while filling in their own, perhaps totally insane and incorrect reason.

my point was just that there are valid reasons that floats and doubles wouldn't be enough for some computations, regardless of how accurate your convertor is.

by the way, most modern convertors are three-bit wide delta-sigma convertors which use filters and high-frequency dither to produce 24-bit results regardless of the noise present in the source signal.

the limit isn't your convertor, the limit is the width (in time) of your samples.
codehead wrote:Not understanding this comment—floats "re-normalize" themselves (normally ;-)). Instead, you need to manage the error (by retaining and processing it, but using doubles at critical points, and/or by managing the algorithms you're using in order to minimize error).
i mean as you extract more bits you could continue processing your filters in float and shifting the extra bits elsewhere. you'd need to deal with the end result of however many bits though, eventually. it just isn't likely you'd really want to use the standard opcodes to do that. you could use a bignum implementation instead.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.

codehead
KVRist
30 posts since 22 Dec, 2010

Post Sun Feb 19, 2012 5:10 pm

aciddose wrote:my point was just that there are valid reasons that floats and doubles wouldn't be enough for some computations, regardless of how accurate your convertor is.
Sure, but the context in which the article I commented on was "audio path". And in that context, 64-bit is questionable, and greater than that is...ridiculous. And again, the issue was regarding thermal noise—which is independent of word length.

And of course, I think all of us have been in agreement that double, possibly more, are needed for some computations. So, I guess that if you did understand my comment from the beginning, then you weren't disagreeing with me—and that was what I was unsure of in past messages...

User avatar
andy-cytomic
KVRAF
2252 posts since 3 Dec, 2008

Post Sun Feb 19, 2012 5:54 pm

codehead wrote:
andy_cytomic wrote:Vector instructions on Intel chips can process twice as many floats as doubles.
Excellent point—yes, I was considering normal CPUs (not DSPs), but I forgot about vector processing. I guess the bottom line is that if a processing unit is built to handle only one floating point operation at a time, double will execute as fast as single; but if it's built to do multiple parallel operations, it's going to optimize real estate...
I don't know of any architecture that this is the case, usually there is some sort of overhead in cycles or memory loads or something which makes double precision cost more than single. In the case of Texas Instruments chips doubles take twice as long to compute. Intel and AMD chips are ruled out here as they are not built to handle only one floating point operation at a time (which I am taking to mean non-vector since most processors pipeline floating point operations so there can actually be multiple scalar operations in the pipeline at once, each at different stages of execution)
The Glue, The Drop - www.cytomic.com

User avatar
andy-cytomic
KVRAF
2252 posts since 3 Dec, 2008

Post Sun Feb 19, 2012 7:13 pm

codehead wrote:
andy_cytomic wrote: ... I recommend to never use a direct form 1 or 2 biquad ever, period, but if you must then you really do need double precision for all coefficients and memory since the structure is so bad it adds significant quantization error to your signal, as well as the cutoff being nowhere near where you want it. Check out the plots I've done here that compare different digital linear filter topologies: www.cytomic.com/technical-papers
Andy, I agree, mostly—if a person doesn't understand the pitfalls, they should avoid using direct form, or more likely just use double precision (I say more likely, because if they aren't experienced enough to understand the pitfall to begin with, they probably would be gravitating towards readily available material such as rbj's IIR formulas or some other semi-canned solution—that's why I say "mostly").
Or you could use the modified trapezoidal svf structure I have proposed, which has excellent noise properties, places the cutoff where you want it, can be modulated, and is easy to compute, and is a semi canned solution since the pseudo code is right here: www.cytomic.com/technical-papers
The Glue, The Drop - www.cytomic.com

codehead
KVRist
30 posts since 22 Dec, 2010

Post Mon Feb 20, 2012 9:57 am

andy_cytomic wrote:Or you could use the modified trapezoidal svf structure I have proposed, which has excellent noise properties, places the cutoff where you want it, can be modulated, and is easy to compute, and is a semi canned solution since the pseudo code is right here: www.cytomic.com/technical-papers
That is a very nice filter, thank you Andy.

User avatar
andy-cytomic
KVRAF
2252 posts since 3 Dec, 2008

Post Mon Feb 20, 2012 7:04 pm

codehead wrote:
andy_cytomic wrote:Or you could use the modified trapezoidal svf structure I have proposed, which has excellent noise properties, places the cutoff where you want it, can be modulated, and is easy to compute, and is a semi canned solution since the pseudo code is right here: www.cytomic.com/technical-papers
That is a very nice filter, thank you Andy.
You're welcome! I actually solved the filter based purely on circuit simulation based trapezoidal numerical integration (ie no delays in any feedback path and correct phase and amplitude at cutoff and dc, and at nyquist it matches the response at +inf), but in the technical paper I have used direct replacement of the integrators directly by trapezoidal integrators as this will most likely make more sense to a majority of people.

Although it already sounds good for audio rate modulation I've also solved the equations taking into account the effect that changing coefficients have, and you can hear the result of using this extra information (but in a non-linear trapezoidal integrated sallen key filter) at my web page in The Drop product announcement page, which includes non-oversampled examples.

I have also now solved for several variations on shelving shapes including one which matches the shape that RBJ uses in his very well known technical paper. Since the SVF is trapezoidal integrated it has an identical magnitude and phase response to the ideal bilinear transformed prototype filters.

I'm working on a full paper to submit to AES with much more in rigorous analysis of the structure, but Ross Bencina can vouch for the superior numerical properties, he was using an all double precision direct form 1 biquad to do some critical filtering but was having lots of trouble, when he switched to the trapezoidal svf everything suddenly worked perfectly :) Ross can fill you in on the details, but I think it was filtering time sync information.
The Glue, The Drop - www.cytomic.com

mystran
KVRAF
5381 posts since 12 Feb, 2006 from Helsinki, Finland

Post Tue Feb 21, 2012 12:07 am

+1 on never using direct forms (well, for first order filters they are fine).

I've posted this before, but my favourite simple biquad is as follows (you can use singles instead of doubles for the temporaries and states; I'm just a perfectionist):

Code: Select all

double tmp = KillDenormals(z1 * f + (in - z2) * e);
double out = tmp * t0 + z1 * t1 + z2 * t2;

// store new states        
z2 = KillDenormals(tmp * e + z2);
z1 = tmp;
Essentially modified coupled form (and not too different from classic digital SVF); as far as I can tell it's got decent numerical performance and it's pretty cheap to calculate. State variables are z1,z2, zero-coeff are t0,t1,t2 and pole-coeffs e,f which can be calculated from [b0,b1,b2,1,a1,a2] direct form coeffs as follows:

Code: Select all

f = a2
e = sqrt(1 + a1 + a2)

t0 = b0 / e
t1 = -b2 / e
t2 = (b2 + b1 + b0) / (e*e)
Obviously substituting the above directly into BLT formulas is possible, but I'm not convinced it's necessarily worth the trouble as long as you use double precision to calculate the coeffs. Designed such that internal gains are more or less independent of tuning (and DC gain more or less independent of resonance; not exactly normalized 'cos it's close enough for my purposes) so reasonably fast modulations work fine. Not necessarily the perfect synth filter, but it's what I use for all utility biquads now.

Oh and Andy, if you still got the test setup for the noise performance, I'd love to know how the above compares with the others you tested. :)
If you'd like Signaldust to return, please ask Katinka Tuisku to resign.

codehead
KVRist
30 posts since 22 Dec, 2010

Post Tue Feb 21, 2012 1:14 pm

andy_cytomic wrote:You're welcome! I actually solved the filter based purely on circuit simulation based trapezoidal numerical integration (ie no delays in any feedback path and correct phase and amplitude at cutoff and dc, and at nyquist it matches the response at +inf), but in the technical paper I have used direct replacement of the integrators directly by trapezoidal integrators as this will most likely make more sense to a majority of people.
I did some tests (single-precision, no oversampling) yesterday, verifying the response at various settings by pinging it, and various-rate exponential sweeps, 18 kHz to 20 Hz driven by a band-limited sawtooth. Definitely an exceptional filter, and especially for the computational price (uh, the actual price is real good too—thanks again Andy! ;-) ).

User avatar
andy-cytomic
KVRAF
2252 posts since 3 Dec, 2008

Post Tue Feb 21, 2012 6:35 pm

mystran wrote: Oh and Andy, if you still got the test setup for the noise performance, I'd love to know how the above compares with the others you tested. :)
Hi Mystran,

Thanks for posting the filter structure. Is this one you created yourself? I'll add it to my test framework when I get some time.

For the record I'm doing a pretty basic test at the moment, I run the filters on the summation of 20, 200, 2000, 20000 khz sin waves, then take a big long fft and notch out the sins and take the rms of the leftover stuff, which shouldn't be there since the filters are linear. I think a sin sweep deconvolution may be more accurate, but it will take some time to implement and make sure it is working properly.
The Glue, The Drop - www.cytomic.com

mystran
KVRAF
5381 posts since 12 Feb, 2006 from Helsinki, Finland

Post Wed Feb 22, 2012 9:32 am

andy_cytomic wrote:
mystran wrote: Oh and Andy, if you still got the test setup for the noise performance, I'd love to know how the above compares with the others you tested. :)
Hi Mystran,

Thanks for posting the filter structure. Is this one you created yourself? I'll add it to my test framework when I get some time.
Well, I don't know how much credit I should take considering it's not that different from known filters, but yeah, I did come up with it myself. What I did was to look at possible state space matrices, ultimately picked the modified coupled form and added decay such that it can do all stable pole-configurations without singularities. Similarly from the state-space formulation I then searched for possible tap configurations (there's actually at least two sensible ones; the one above is probably the most simple to calculate, and IMHO gives the "least bad" results if you throw a saturation around the "tmp" variable to stablize it for unstable coefficient sets) to find one that would allow any zero-configuration without singularities.

Anyway, mostly I just wanted something better than direct form (for arbitrary stable 2-poles), and I wasn't quite happy with well-known structures like ladder/lattice. Since I'm using these for every biquad I have, I didn't really want to add significant calculation overhead, so I went with a state-space solution. The one above is my 3rd or 4th iteration, actually. :)
If you'd like Signaldust to return, please ask Katinka Tuisku to resign.

User avatar
andy-cytomic
KVRAF
2252 posts since 3 Dec, 2008

Post Wed Feb 22, 2012 7:13 pm

Mystran: if this exact structure wasn't in existence before you came up with it you can take all credit for it. In fact you have done much more work than me it sounds like, I just applied standard numerical integration techniques to a standard filter structure and normalised the coefficients a bit so that the tan term didn't get large at high frequencies. I really look forward to doing a numerical analysis of your structure, and thank you for sharing it.
The Glue, The Drop - www.cytomic.com

Return to “DSP and Plug-in Development”