Integer is King? - final thoughts about the EQ challenge

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

well, mad if you'd care to listen to the example renderings from the last posted code you'll see that in fact it is very audible and you even get 6db difference in peak amplitude.

http://xhip.cjb.net/temp/public/difference_signal.wav

this is (float - int). float and double produce bit-for-bit matching results in the 'short' output.

what this seems to show is that there isnt really much difference in this case and probably many other cases between single and double. there is a major difference in many cases between float and int, though.

either way, discounting a difference because you do not notice it or because it does not influence you does not prove that it is not important :P

sam; i noticed last night my version was still not working. i could make it give error in the lsb, or the 16th bit by changing things around, then i gave up on it again. yours seems to still give an error in the lsb (of -1) compared to the x86 instruction for negative values. this is pretty tricky stuff, and it definitely will not help anybody to write faster code. it is interesting however because we could use the same technique to scale up to as many bits as we like. the fastest method would be to use the x86 32x32 instruction to get 64x64 results, and then that function could be used to get 128x128 results, and so on. so it could potentially be worth the effort, if not just fun to mess with since it makes you feel like a binary wizard. :hihi: ...or would if we could get it right :cry:

i think the error in your version may be due to the fact -n = ~n+1. so you can fix it by simply adding the line:

acc -= (~acc >> 31) & 1 & acc;

...i think.

there should be a more simple way to do this.

...nope, rofl.

a good test function for this:

Code: Select all

 long n = 1;
 while (n != 0)
 {
  long reference = m32x(0x7FFFFFFF, n);
  long result = m32l(0x7FFFFFFF, n);
  long delta = reference - result;
  if (delta) { printf("err %i != %i\n", reference, result); break; }
  n++;
 } 

Post

I just feel the need to say something before this investigation gets washed away by IntegerChallengeII.

From listening to the files I think you guys are really after something.

I hear the difference between the two outputfiles as very specific for float and int.
That kind of "body" that the float is missing from the int here, is something I have looked for for a long time in floatingpoint audiosofware.

please keep it up

thorK

Post

Ok, think I've found it...
aciddose wrote:what this seems to show is that there isnt really much difference in this case and probably many other cases between single and double. there is a major difference in many cases between float and int, though.
This set alarm bells ringing, an error that doesn't reduce as the precision goes up? Not even the "creeping error" theorem works for that, because the cummulative error would cumulate less quickly with more precision.

Tried extending the int resolution.. didn't help.

Then had a closer look at both output waveforms, next to the input waveforms.

Something strange happening at the step...., a major input level change is causing no immediate change in the output level on the int version, Now I know it's low pass filtered, but not that much!

Looked at the code again

In the float version we have

Code: Select all

 b = sat(b, 0.5);
 b += f * h;
and in the int version...

Code: Select all

 b = sat(b, 1073741824); //0.5
 b += 2*m32(f, h);
Which at first glance look the same, BUT... what happens in the int version if b is saturated at 0.5, and f * h is greater than 0.5??

It wraps, not good. Same when we're dealing with -0.5.

So I wrote some fixed point code to give the integer version some headroom

Code: Select all

__int64 m32p31(__int64 a, __int64 b)
{
	__int64 r = a * b;
	return r >> 31;
}

long dom32p31(double i, double freq)
{
 __int64 input = (long)(i * 2147483647.0);

 static __int64 b = 0, l = 0;
 __int64 f = (__int64)(freq * 2147483647.0); //scale integrator rate coeff
 __int64 q = (__int64)((1.0 - feedback) * 2147483647.0); 

  __int64 h;

  
 //these scalings (2*, 3*) are applied to the coefficients 
 //pre-process in the float version.
 h = input - l;
 b = sat(b, 1073741824); //0.5
 b += 2*m32p31(f, h);
 l += 2*m32p31(f, b - 2*m32p31(q, l)); 

 h = input - l;
 b = sat(b, 1073741824); //0.5
 b += 2*m32p31(f, h);
 l += 2*m32p31(f, b - 2*m32p31(q, l)); 
  
 return l; //return the lowpass
}
Run that and you find the difference between double and int is just some low level noise around 1-2 bits in magnitude).

Post

sambean wrote:It's quite an interesting challenge to do a 32 bit multiply using only a 32 bit reg..

Code: Select all


ulong aa = abs(a)
ulong ab = abs(b)

ulong ac = 0

ac += (loword(aa) * loword(ab))
ac = ac >> 16
ac += ((loword(aa) * hiword(ab))
ac += ((hiword(aa) * loword(ab))
ac = ac >> 16
ac += ((hiword(aa) * hiword(ab))

if (((a xor b) >> 31) = 1) ac = ac xor FFFFFFFFh
If you do the shift on the accumulator *after* the accumulation, you will reduce error. Have the acumulator follow the precision of the multiplys rather than the results of the multiplys shifted to a fixed accumulator.

IE. If you shift the result of the 16x16 bit multiply before adding it to the accumulator you are dropping 16 bits which could cause have caused a carry up into the remaining bits.

The reason neither yours or acidoses methods will work with negative numbers is because you are not using the proper 2s compliment of the lower 16 bits. When you bitmask the bottom 16 bits with a straight FFFFh you are turning into a positive number. It actualy needs to be sign extended up to 32 bits if it is negative. Eg..

FFFF,FF67h = FFFF,FF67h
FF01,0203h = FFFF,0203h

The other option is to abs both inputs. And then the sign of the result will the xor of the 2 original sign bits.

The advantage of that method that you dont need to worry about overflow when you multiply the two low words.

Post

float is Queen!

Image

Post

a little more adjustment to make the two processes more equal, and the differences come down to a single bit, and looking at it I think it's probably just down to some difference between rounding and truncation somewhere.

Basically once you fix the bug in the integer implementation, for this filter with this input stream, with a 16 bit output, there's no difference between float, double 8:24 fixed point and 8:56 fixed point

Post

Thank you Jon Hodgson! (I'm too lazy and busy fighting with my own bug to get into this, so again, THANKS!)

Post

But if you use a 24 bit output, then the difference between 8:56 and 8:24 is considerably bigger than between 8:56 and float or double.
So in fact in this case it looks like 32 bit float is more accurate than 32 bit fixed point. though maybe I could get away with 2:30 fixed point and get more out of it.

Nope, could be getting something wrong, but 2:30 fixed point is still coming out as more different from 8:56 fixed point that 32 bit float is (the errors seem pretty spread out across the spectrum though, so it's not like the response of the filter changes drastically, just the quality of the output).

Post

JonHodgson wrote: Nope, could be getting something wrong, but 2:30 fixed point is still coming out as more different from 8:56 fixed point that 32 bit float is (the errors seem pretty spread out across the spectrum though, so it's not like the response of the filter changes drastically, just the quality of the output).
Have you tried turning the FPU precision down to 32 bit? It could be benefiting from 80 bit precision even if the state vars are being thunked down to 32 bit when saved.

Post

nollock wrote:
JonHodgson wrote: Nope, could be getting something wrong, but 2:30 fixed point is still coming out as more different from 8:56 fixed point that 32 bit float is (the errors seem pretty spread out across the spectrum though, so it's not like the response of the filter changes drastically, just the quality of the output).
Have you tried turning the FPU precision down to 32 bit? It could be benefiting from 80 bit precision even if the state vars are being thunked down to 32 bit when saved.
Ah yes, good point, any idea if I can do that with the FPU. I know I can do it using SSE instructions

Post

Yes. There are 64bit and 32bit rounding modes for the x87 fpu, you can turn them on at least in assembly, probably by changing the fpu setup flags. Dunno if there's a windows API function to do it for you.

Post

there are definately more issues if I limit intermediate calculations to 32 bit flosts, though I need to look into it more. What's the cutoff point of this filter, it doesn't seem to be doing all that much cutting.

Post

ok, solved it.
When what you're looking at just doesn't seem to make sense...
you may not be seeing what you think you are seeing.
In this case setting the rounding mode to 32 bits was screwing up the input integrator so it was drifting up over time (every reset 2.0 is subtracted from it, it isn't reset to an absolute value), and that was throwing everything out.

Keep that integrator going at full resolution (to keep the inputs the same) and then the 32 bit rounding makes no difference inside the filter that I can see.

Post

hm, curiously these findings match with thorkz's results: the less "accurate" integer method sounds better?

they also match with what i've stated many times: two exact implementations will produce two exactly duplicate results. duplicate process = duplicate result.

also: working with each format will produce distinct implementations which can be distinctly optimized and will have distinct properties.

apparently none of the float specific error behaviours have anything to do with the filter stuff i experienced long ago. perhaps my float implementation contained bugs while my int implementation was perfect, or perhaps the int implementation was more capable of handling the bugs. either way i find it funny since float is supposed to be much more easy to write code for.

i'm not surprised that most of the effects i theorized about do not make a difference in these cases. you'd have to find extreme corner cases to show them. although, i note we still havent done anything except run a very simple calculation showing that in some conditions there can be minor differences, and in some conditions there can be major differences.

we need more extensive experiments in order to provide evidence for anything we do not already know. so far this example has only shown two things: the int implementation is more difficult to manage and the results are less accurate by average.

i'm curious about benchmarking between these implementations though. i think we'll find that with naive implementations int is slightly faster. with denormal handling for the float version, they should be equal. with float specific optimizations vs. int specific optimizations, should be equal. just my theories though, and so far we can see they've been wrong before :hihi:

Post

aciddose wrote:hm, curiously these findings match with thorkz's results: the less "accurate" integer method sounds better?
Thorkz's listening tests were based on your initial code, which had a bug, so you can't make any deductions of which number system sounds better.
also: working with each format will produce distinct implementations which can be distinctly optimized and will have distinct properties.
Well in the case of your posted filter, apparantly not as far as the properties goes. Of course this will depend on how far your optimizations take your algorithms from numerical equality.
apparently none of the float specific error behaviours have anything to do with the filter stuff i experienced long ago. perhaps my float implementation contained bugs while my int implementation was perfect, or perhaps the int implementation was more capable of handling the bugs. either way i find it funny since float is supposed to be much more easy to write code for.
Or apparantly, as in this case, your INT versions had bugs? You certainly had less trouble coding the float version in this case.
i'm not surprised that most of the effects i theorized about do not make a difference in these cases. you'd have to find extreme corner cases to show them. although, i note we still havent done anything except run a very simple calculation showing that in some conditions there can be minor differences, and in some conditions there can be major differences.
Well not much in the way of major differences here to be honest, at high word sizes they're pretty much identical, and as the word sizes reduce, the results have more and more errors in them, the int version failing faster.
we need more extensive experiments in order to provide evidence for anything we do not already know. so far this example has only shown two things: the int implementation is more difficult to manage and the results are less accurate by average.
Peak errors are less accurate in this case too.
i'm curious about benchmarking between these implementations though. i think we'll find that with naive implementations int is slightly faster. with denormal handling for the float version, they should be equal. with float specific optimizations vs. int specific optimizations, should be equal. just my theories though, and so far we can see they've been wrong before :hihi:
I think you underestimate the power of the floating point hardware in modern CPUs.

Post Reply

Return to “DSP and Plugin Development”