## dangerous floating point math

17 posts
• Page

**1**of**2**•**1**, 2- Mr Entertainment
- 11083 posts since 29 Apr, 2002, from i might peeramid

seeking insight here....

i've had a lot of problems with something i didn't expect to. my envelope code has a 32 bit float "level" variable that ranges from 0.0f to 1.0f

the release is performed, imo predictably by decrementing. level -= dec. a check to shut the envelope off is then performed.. this used to be if (level <= 0.f) envelope = off, but i've had significant issues with the code not reaching this point. if i increase the threshold to (level < .0005f) the performance is much better, notes rarely hang. but, this is of course lousy resolution.

okay, so f**k me. i haven't made any great effort to educate myself on the thresholds of floating point, i know that after a certain size, precision is lost (eg. integers over a certain size cannot be precisely indicated). the solution is to scale my envelope to 100000 or something instead of 1.

i'm guessing that 32 bit floating point precision is resulting in something like small#a minus small#b = no discernible difference, therefore floating point calculation returns the same, there is effectively no subtraction. that seems reasonable to me, but i wouldn't expect to reach this threshold until about six decimal places.

anyone?

i've had a lot of problems with something i didn't expect to. my envelope code has a 32 bit float "level" variable that ranges from 0.0f to 1.0f

the release is performed, imo predictably by decrementing. level -= dec. a check to shut the envelope off is then performed.. this used to be if (level <= 0.f) envelope = off, but i've had significant issues with the code not reaching this point. if i increase the threshold to (level < .0005f) the performance is much better, notes rarely hang. but, this is of course lousy resolution.

okay, so f**k me. i haven't made any great effort to educate myself on the thresholds of floating point, i know that after a certain size, precision is lost (eg. integers over a certain size cannot be precisely indicated). the solution is to scale my envelope to 100000 or something instead of 1.

i'm guessing that 32 bit floating point precision is resulting in something like small#a minus small#b = no discernible difference, therefore floating point calculation returns the same, there is effectively no subtraction. that seems reasonable to me, but i wouldn't expect to reach this threshold until about six decimal places.

anyone?

you come and go, you come and go. amitabha xoxos.net free vst. neither a follower nor a leader be

tagore "where roads are made i lose my way"

tagore "where roads are made i lose my way"

- KVRist
- 76 posts since 25 Sep, 2001, from Paris, France

You might want to have a look at this http://en.cppreference.com/w/cpp/types/ ... ts/epsilon

On my machine std::numeric_limits<f32>::epsilon() is 1.19209290e-007 (the spec says it should be smaller than 1e-5).

Maybe the floating point precision compiler settings have an effect on this too, I haven't checked (see for example http://msdn.microsoft.com/library/e7s85ffb.aspx)

On my machine std::numeric_limits<f32>::epsilon() is 1.19209290e-007 (the spec says it should be smaller than 1e-5).

Maybe the floating point precision compiler settings have an effect on this too, I haven't checked (see for example http://msdn.microsoft.com/library/e7s85ffb.aspx)

**Banned**

xoxos wrote:the solution is to scale my envelope to 100000 or something instead of 1.

No.

What range is dec? What's the lowest value it ever has? How do you calculate it?

- KVRAF
- 5633 posts since 16 Feb, 2005

Are you scaling the dec value per increment? If you are using multiplication by a value smaller than one anywhere with dec than you are slowing and possibly sustaining the envelope. It looks linear so far. What kind of dynamic range are you expecting? Single or double precision?

- KVRAF
- 5633 posts since 16 Feb, 2005

http://www.h-schmidt.net/FloatConverter/IEEE754.html

You can figure a threshold with defined dynamic range by multiplying every power change of the exponent segment by 6 db. 2^-24 exponent would be -144 dBFS. Then just use the hex value as a constant instead of the numeric.

You can figure a threshold with defined dynamic range by multiplying every power change of the exponent segment by 6 db. 2^-24 exponent would be -144 dBFS. Then just use the hex value as a constant instead of the numeric.

- KVRian
- 741 posts since 23 Feb, 2012

I suspect there is a bug in your control logic which is not related to precision issues. btw, the higher the scale, the bigger the epsilon size. Scaling the range up will not help.

However, you can reduce precision problem to irrelevant levels by using double precision floats, careful dithering and/or by working in the dB domain (btw, the latter is practically a "must" for accurate followers in the audio context).

However, you can reduce precision problem to irrelevant levels by using double precision floats, careful dithering and/or by working in the dB domain (btw, the latter is practically a "must" for accurate followers in the audio context).

Fabien from Tokyo Dawn Records

Check out my audio processors over at the Tokyo Dawn Labs!

Check out my audio processors over at the Tokyo Dawn Labs!

- Mr Entertainment
- 11083 posts since 29 Apr, 2002, from i might peeramid

thanks for the references.. i've seen the information, hadn't internalised it as it hasn't previously been a necessary part of discerning whether an algorithm works..

increment value is 1.f / (env time in samples) so at 32 bits it's bound to err.

gotta go fix something to lessen the amount of yelling at me before further consideration is possible..

increment value is 1.f / (env time in samples) so at 32 bits it's bound to err.

gotta go fix something to lessen the amount of yelling at me before further consideration is possible..

you come and go, you come and go. amitabha xoxos.net free vst. neither a follower nor a leader be

tagore "where roads are made i lose my way"

tagore "where roads are made i lose my way"

- KVRist
- 76 posts since 25 Sep, 2001, from Paris, France

xoxos wrote:increment value is 1.f / (env time in samples) so at 32 bits it's bound to err.

I believe 64bit doubles are mandatory here too, as summation errors would go over the roof in simple precision anyway even if you found a way to make that substraction accurate at low values.

More interesting reading (for me & others too I suppose )

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

http://www.drdobbs.com/floating-point-s ... /184403224

**Banned**

xoxos wrote:increment value is 1.f / (env time in samples) so at 32 bits it's bound to err.

That should be fine actually. You have 24 bits of mantissa in a 32 bit float. That's 174 seconds at 96 kHz.

But I was asking for the ACTUAL value of dec. Run it in the debugger and see what it really is. When in doubt, get empirical.

- KVRist
- 76 posts since 25 Sep, 2001, from Paris, France

AdmiralQuality wrote:xoxos wrote:increment value is 1.f / (env time in samples) so at 32 bits it's bound to err.

That should be fine actually. You have 24 bits of mantissa in a 32 bit float. That's 174 seconds at 96 kHz.

But I was asking for the ACTUAL value of dec. Run it in the debugger and see what it really is. When in doubt, get empirical.

Except it's not so simple as

- Code: Select all
`(thresh * 96000smp) <= 2^24 ~= 16.7M (approx 1/std::numeric_limits<float>::epsilon())`

That would assume rounding errors don't propagate at each step. You'd have to actually store ints for that to work, or use special summation formulae.

I just did a quick test:

- Code: Select all
`float fSum = 0.f;`

float fVal = 1e-6f;

enum { nSum = 1000000 };

for (uInt i = 0; i < nSum; ++i)

{

fSum += 1e-6f;

}

at then end fSum equals 1.00904, so a delta of 0.009039 -> roughly 1% error

Last edited by lorcan on Tue Feb 19, 2013 11:44 am, edited 3 times in total.

**Banned**

lorcan wrote:AdmiralQuality wrote:xoxos wrote:increment value is 1.f / (env time in samples) so at 32 bits it's bound to err.

That should be fine actually. You have 24 bits of mantissa in a 32 bit float. That's 174 seconds at 96 kHz.

But I was asking for the ACTUAL value of dec. Run it in the debugger and see what it really is. When in doubt, get empirical.

Except it's not so simple as

- Code: Select all
`(thresh * 96000smp) <= 2^24 ~= 1.67M (approx 1/std::numeric_limits<float>::epsilon())`

That would assume rounding errors don't propagate at each step. You'd have to actually store ints for that to work, or use special summation formulae.

2^24 is 16 million.

Anyway, I'd have to sit down and actually think to figure this out as the exponent will come into it as well (hence my advice to check it empirically) but 32 floats should deal with this kind of range just fine.

You could always calculate it it the other way, by counting samples, if you don't mind a multiplication at every sample. But I'd take a look at what the actual values are first, I suspect the issue isn't what xoxos thinks it is.

- KVRist
- 76 posts since 25 Sep, 2001, from Paris, France

AdmiralQuality wrote:

2^24 is 16 million.

Anyway, I'd have to sit down and actually think to figure this out as the exponent will come into it as well (hence my advice to check it empirically) but 32 floats should deal with this kind of range just fine.

You could always calculate it it the other way, by counting samples, if you don't mind a multiplication at every sample. But I'd take a look at what the actual values are first, I suspect the issue isn't what xoxos thinks it is.

Yep, sorry I was a bit too quick there, the rest of the calculation holds tough.

I just edited my prev post with a test.

Cheers

**Banned**

Small error is irrelevant to what he's trying to achieve. As long as it continues to decrement it has to pass zero and go negative at some point. He feels it's never happening though.

Again, what's dec, EXACTLY?

Again, what's dec, EXACTLY?

- KVRAF
- 5633 posts since 16 Feb, 2005

lorcan wrote:xoxos wrote:increment value is 1.f / (env time in samples) so at 32 bits it's bound to err.

I believe 64bit doubles are mandatory here too, as summation errors would go over the roof in simple precision anyway even if you found a way to make that substraction accurate at low values.

More interesting reading (for me & others too I suppose )

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

http://www.drdobbs.com/floating-point-s ... /184403224

Truncating the result should provide the most useful behavior for an envelope. I suppose the error is not as severe as the result in this case.

Last edited by camsr on Tue Feb 19, 2013 2:52 pm, edited 1 time in total.

**Banned**

I did it the same way and it worked for me. (I've since changed everything to doubles but never had any problem when it was all 32 bit.)

You're thinking too much. You can definitely count to 16 million with floats. The problem is something ELSE. Probably dec is zero or something.

Wait, is this not just the envelope counter but also the output level? If so, what happens if a long release time activates when a decay or sustain level was almost at zero, does it start counting down from THERE? Or from 1? (Another way of asking, are they rate envelope segments or time envelope segments?)

You're thinking too much. You can definitely count to 16 million with floats. The problem is something ELSE. Probably dec is zero or something.

Wait, is this not just the envelope counter but also the output level? If so, what happens if a long release time activates when a decay or sustain level was almost at zero, does it start counting down from THERE? Or from 1? (Another way of asking, are they rate envelope segments or time envelope segments?)