KVR Audio

neodymDNB · Post by **neodymDNB** » Sat Jun 25, 2016 10:50 am

1.can you explain to me when and why to use floating point internal processing in EQ filter?

2.if I convert the 16/24bit to this 32 or even 64bit float point,does it introduce some distortion? I mean if I keep converting it back and forth many times,will it get progresively different from the original before the conversion?

3.I tried hard to understand the floating point but I still dont get it,is it using fractional numbers like 1.3334 instead of the whole numbers like 1 2 3?

normal 32bit pcm have 192db dynamic range,32bit float have 1680 dB,how is that possible? if float point have bigger dynamic range per bit why is not music released as floating point?

antto · Post by **antto** » Sat Jun 25, 2016 12:16 pm

with integers, let's take 8bit signed for example to keep the numbers small
you'd represent your audio signal as numbers between -128 and +127, where 0 is silence
now, you have the same amplitude step between +126 and +127, like between 0 and +1
that's great for the loud parts, but very bad for the quiet parts

floating point works in a different way
with it, the loud parts have bigger amplitude steps, but the quieter you go, the smaller the steps
a very rough example: 1.0, 0.5, 0.25, 0.125, 0.0625, ...
that's why most audio, image, and video processing is done using floating point math

lots of the audio and image formats use integers because they can be smaller
some of them use them in nonlinear way to overcome the problem of lack of resolution near zero

but as storage capacity grows, file formats storing floating point are being used more often

Miles1981 · Post by **Miles1981** » Sat Jun 25, 2016 3:12 pm

neodymDNB wrote:1.can you explain to me when and why to use floating point internal processing in EQ filter?

2.if I convert the 16/24bit to this 32 or even 64bit float point,does it introduce some distortion? I mean if I keep converting it back and forth many times,will it get progresively different from the original before the conversion?

3.I tried hard to understand the floating point but I still dont get it,is it using fractional numbers like 1.3334 instead of the whole numbers like 1 2 3?

normal 32bit pcm have 192db dynamic range,32bit float have 1680 dB,how is that possible? if float point have bigger dynamic range per bit why is not music released as floating point?

1) always
2) no, 16bits integers and 24bits integers can be represented precisely in float, even 32
3) yes, it's integer numbers mutiplied by a power of 2
4) because in the past, floating point computation was more expensive and not available in all DSPs.

neodymDNB · Post by **neodymDNB** » Sun Jun 26, 2016 1:47 pm

does EQ really need to internaly work in float point? is there any benefit?

Miles1981 · Post by **Miles1981** » Sun Jun 26, 2016 2:26 pm

Precision, yes. With roots of the IIR near the unit circle, you require precision, and this is not something you get with integer/fixed- precision computation.

hugoderwolf · Post by **hugoderwolf** » Mon Jun 27, 2016 8:36 am

Miles1981 wrote:Precision, yes. With roots of the IIR near the unit circle, you require precision, and this is not something you get with integer/fixed- precision computation.

It isn't something you get with floating point either. There are fewer things to worry about, but floating point does have precision issues.

See here for example, pretty eye-opening: http://www.cytomic.com/files/dsp/SVF-vs-DF1.pdf

Miles1981 · Post by **Miles1981** » Mon Jun 27, 2016 8:40 am

Thx, I know that paper quite well.
And of course, floating point has precision issues as well! It's not even commutative.

earlevel · Post by **earlevel** » Mon Jun 27, 2016 7:38 pm

neodymDNB wrote:1.can you explain to me when and why to use floating point internal processing in EQ filter?

2.if I convert the 16/24bit to this 32 or even 64bit float point,does it introduce some distortion? I mean if I keep converting it back and forth many times,will it get progresively different from the original before the conversion?

3.I tried hard to understand the floating point but I still dont get it,is it using fractional numbers like 1.3334 instead of the whole numbers like 1 2 3?

normal 32bit pcm have 192db dynamic range,32bit float have 1680 dB,how is that possible? if float point have bigger dynamic range per bit why is not music released as floating point?

tl;dr: Due to the convenience of floating point, we build really fast, cheap floating point hardware these days. That's why we use floating point.

First, realize that "integer" processing of filters is really "fixed point". Biquad coefficients, for instance, are mostly less than 1.0 (magnitude) for typical filters (or nearby—below 2.0). You can probably get the sense that if you're adding and multiplying by large integers, the filter output might blow up pretty quickly for a reclusive filter, or just stay big for non-recursive. A simple way to do fixed point math with an integer processor is to have all the coefficients be a multiple of some fairly large power of two, then shift as needed. For instance, if I want to multiple by 0.5, I could multiply by 8, then right shift the result 16 places in binary integer math. That leaves you with the integer, and if you keep the 16 bits you shifted off, that's the fraction. Fixed-point DSP chips like 56k handle the shifting magic for you.

The problem with fixed-point for general math is that you have to make a decision with the number of bits you want to work with, and the largest and smallest numbers you'd like to work with. With floating point, we make the same decisions, except that we dedicate some of the bits to scaling so that the range can extend for extremely large to extremely small.

There's no free lunch—32-bit floating point precision (essentially the number of increments the entire range is split into) is about the same as 24-bit fixed point (actually 25-bit, but 32-bit float is common and 24-bit fixed is common, so using that comparison).

To aid the limitations of 24-bit, the 56k actually has a 56-bit accumulator, allowing you to keep the full-precision result of a 24x24 fixed-point multiply, plus another additional bits to allow temporary overflow—allowing the maximum number to be 255 plus 48 bits of fraction, at least for intermediate results—that blows away 32-bit float for precision.

A disadvantage of our comparison 32-bit float is that when you multiple two 32-bit floats (again, 25-bits of precision, resulting in about 50 bits of potential result), the result is immediately truncated to 32-bit float. So it can be significantly less precise (sure, just move up to doubles, with 54 bits of precision—it will still truncate the result, but it will be on par with the best possible case of 24-bit fixed math).

The big advantage of floating point is the automatic scaling. It's like a ruler that you can stretch or compress—the same number of tick mark, but you control the range they are spread over. It's a lot easier to work with fixed point. With fixed point, an overflow results in clipping, and an underflow results in zero. Think of it this way: If you have an integer and you shift it right 32 bit, then shift it left 32 bits, in a sense you've don't nothing mathematically—yet you just wiped out your integer.

However, floating point is not without pitfalls. You can represent a huge number with floating point, or a tiny number. But if you add them together, the tiny number will just go away (not enough bits in the floating point representation. Say you have a 5-digit decimal floating point system—easy to imagine as a spreadsheet formatted to show 5 actual digits. In one cell you put "32768", and another "0.01", then in another cell you put a formula to sum the two cells. The answer would be "32768" (while adding 768 and 0.01 would give 768.01). Often, that's fine, because by definition we only care about 5 digits. But in some case the order of your math operations can result in huge, unexpected error, whereas with fixed point the limitations are more obvious and you rarely get caught unaware.

camsr · Post by **camsr** » Mon Jun 27, 2016 7:50 pm

Miles1981 wrote: And of course, floating point has precision issues as well! It's not even commutative.

Actually it is, a long as the multiplicands have no significant bits set in the mantissa.

Miles1981 · Post by **Miles1981** » Mon Jun 27, 2016 7:56 pm

camsr wrote:
Miles1981 wrote: And of course, floating point has precision issues as well! It's not even commutative.
Actually it is, a long as the multiplicands have no significant bits set in the mantissa.

Which means that it is generally not.
My point is that all our usual assumption about math don't work as easilly on computers.

camsr · Post by **camsr** » Mon Jun 27, 2016 8:04 pm

Multiplication is probably the best use of floating points. To show you an example, let's multiply 2 and 1.5
The binary representation of these numbers is,
2 = 0 10000000 00000000000000000000000
1.5 = 0 01111111 10000000000000000000000
the result,
3 = 0 10000000 10000000000000000000000

since one of the multiplicands had no significant mantissa, there was no change to the precision of the result, it was only "scaled" by the exponent.

An easy way to predict when a FP will overflow (via multiplication) it's mantissa is by adding the least significant bit positions of the two operands. In the example, 1.5 is only one bit deep into the mantissa, and 0+1 = 1. If we multiply 3 and 1.5, the mantissa precision will be two bits deep.

Miles1981 · Post by **Miles1981** » Tue Jun 28, 2016 11:45 am

Just add a few numbers together, and the operations are obviously not commutative.
I'm not trying to make a point, I know some operations are commutative, but other are not. Lots of young developers get caught with this, so the sooner they know about the issue the better.

But I definitely can't see the point of your post.

camsr · Post by **camsr** » Tue Jun 28, 2016 4:25 pm

Miles1981 wrote:
But I definitely can't see the point of your post.

Then glean absolutely nothing from it!

floating point EQ?