KVR Audio

BachRules · Post by **BachRules** » Thu Jun 26, 2014 3:07 pm

I really tried Google for this, but can't find an answer.

Wondering how PCM is translated to floating-point. Specifically, for 16-bit PCM,

- How does the 16-bit PCM value 00000000 00000000 translate to floating-point? I'll guess it translates to 0.0.

- How does the 16-bit PCM value 10000000 00000000 translate to floating-point? Does it translate to +1.0, -1.0, or something else?

- Conversely, which 16-bit PCM values are used for representing +1.0 and -1.0?

I used to know this. Thanks for any help.

DougCox · Post by **DougCox** » Thu Jun 26, 2014 5:08 pm

I use the following code to translate floating point to a 16-bit short:
const double sc16 = (double)0x7FFF + 0.4999999999999999;
wBuf= (short)((double)outputs[plug][0][x] * sc16);

BachRules · Post by **BachRules** » Thu Jun 26, 2014 5:51 pm

DougCox wrote:I use the following code to translate floating point to a 16-bit short:
const double sc16 = (double)0x7FFF + 0.4999999999999999;
wBuf= (short)((double)outputs[plug][0][x] * sc16);

Thanks very much for that info. To figure out what it does on the limit cases I'm curious about, I'll have to review C++ (which I haven't coded for a decade), but I can find that info.

JCJR · Post by **JCJR** » Thu Jun 26, 2014 6:19 pm

For 16 bit signed to float, divide by 32768. For float to 16 bit, multiply by 32768.

Going from float to integer, do the initial conversion into a container bigger than the destination container so that it doesn't accidentally wrap on overs. For instance, on 16 or 24 bit, you can use signed int 32.

Or alternately, do the clipping on the scaled float value, before casting into integer. That way you can save the clipped float direct to a 16 bit container and not have to fool with an intermediate 32 bit container.

Clip the float to int results to stay in bounds. Anything lower than 0x8000 clipped to 0x8000, and anything higher than 0x7FFFF clipped to 0x7FFFF.

The advantage of using a factor of 32768 is that bit patterns will stay the same after conversion. If you use 32767, the bit patterns will change. Not much change, but it just doesn't seem elegant to have them change if it can be avoided.

If you want to do it as fast as possible it is just a matter of optimizing the code in whatever way seems sensible.

FLWrd · Post by **FLWrd** » Thu Jun 26, 2014 6:24 pm

The other option is of course: out = (float) in; Your scale will be -32768..+32767, and absolutely all the information is there.

BachRules · Post by **BachRules** » Thu Jun 26, 2014 6:29 pm

JCJR wrote:For 16 bit signed to float, divide by 32768. For float to 16 bit, multiply by 32768.

Going from float to integer, do the initial conversion into a container bigger than the destination container so that it doesn't accidentally wrap on overs. For instance, on 16 or 24 bit, you can use signed int 32.

Or alternately, do the clipping on the scaled float value, before casting into integer. That way you can save the clipped float direct to a 16 bit container and not have to fool with an intermediate 32 bit container.

Clip the float to int results to stay in bounds. Anything lower than 0x8000 clipped to 0x8000, and anything higher than 0x7FFFF clipped to 0x7FFFF.

The advantage of using a factor of 32768 is that bit patterns will stay the same after conversion. If you use 32767, the bit patterns will change. Not much change, but it just doesn't seem elegant to have them change if it can be avoided.

If you want to do it as fast as possible it is just a matter of optimizing the code in whatever way seems sensible.

Thanks, I appreciate it.

BachRules · Post by **BachRules** » Thu Jun 26, 2014 6:34 pm

FLWrd wrote:The other option is of course: out = (float) in; Your scale will be -32768..+32767, and absolutely all the information is there.

Thank you too. So there are options here, not a standard. I wasn't sure about that before I asked.

This makes me wonder, what if a 32-bit FP WAV file contains the value +1.0? That would seem not to be a clip if you import it into a system which multiples by 32767 to convert to 16-bit PCM. But if you import it into a system which multiples by 32768 to convert to 16-bit PCM, then the +1.0 will clip. So there's no standard defining what represents a clip in a 32-bit FP WAV file? Or +1.0 is always considered a clip?

JCJR · Post by **JCJR** » Thu Jun 26, 2014 6:53 pm

There are various opinions, but the very nature of signed ints, 2's complement arithmetic, is that the negative values can always count one more than the positive values.

The internal representation inside a float follows similar rules in the mantissa. So for instance a signed 24 bit int, can have the same bit pattern in a 32 bit float. No loss or change. In the float, if you increment the mantissa one higher than the max positive number, it changes the bit pattern and increments the exponent.

So if you use normalized signals -1.0 to +1.0 in float, there is VERY minor clipping when you convert to int.

The normalized float range is convenient merely because that is what most people use, but you can use any range you like. Also, using the normalized range, your float dsp math works the same regardless the resolution of the source and dest streams.

If you decide to use -32768 to +32767 range in float, then you would still have to convert if the in/out streams happen to be 8 bit, 12 bit, 24 bit, 32 bit, whatever. Or alternately, you could modify your dsp math to different levels for every in/out format case, which would be messy.

Even if you decided to use a symmetrical float range of -32768 to +32768 in your float calculations, you would still have to clip when converting back to SInt16.

BachRules · Post by **BachRules** » Fri Jun 27, 2014 11:30 am

JCJR wrote:There are various opinions, but the very nature of signed ints, 2's complement arithmetic, is that the negative values can always count one more than the positive values.

The internal representation inside a float follows similar rules in the mantissa. So for instance a signed 24 bit int, can have the same bit pattern in a 32 bit float. No loss or change. In the float, if you increment the mantissa one higher than the max positive number, it changes the bit pattern and increments the exponent.

So if you use normalized signals -1.0 to +1.0 in float, there is VERY minor clipping when you convert to int.

The normalized float range is convenient merely because that is what most people use, but you can use any range you like. Also, using the normalized range, your float dsp math works the same regardless the resolution of the source and dest streams.

If you decide to use -32768 to +32767 range in float, then you would still have to convert if the in/out streams happen to be 8 bit, 12 bit, 24 bit, 32 bit, whatever. Or alternately, you could modify your dsp math to different levels for every in/out format case, which would be messy.

Even if you decided to use a symmetrical float range of -32768 to +32768 in your float calculations, you would still have to clip when converting back to SInt16.

Thanks again.

camsr · Post by **camsr** » Fri Jun 27, 2014 12:52 pm

Is there an accepted method on how to handle the negative zero?

otristan · Post by **otristan** » Fri Jun 27, 2014 1:18 pm

http://blog.bjornroche.com/2009/12/int- ... there.html

mystran · Post by **mystran** » Fri Jun 27, 2014 4:07 pm

I'd say the most important thing really is to be internally consistent. I'd also keep zeroes as zeroes and all floating point intervals for different integer values the same size (well, for linear PCM), but really as long as less than half an integer bit of additive floating point noise doesn't have an effect on the loss-less property of int->float->int (or equivalently, as long as a freshly converted float value is midway inside the interval of values represented by the same integer value) your scheme is probably as good as any other.

In other words: to convert from integers to float, do the conversion, then divide by whatever factor you want (eg 0x7fff or 0x8000, the minor advantage of the latter is that it's power-of-two for exact float multiply/divide). Then to convert from float to integers, multiply by the same scale factor and round to nearest (note that if you round with integer cast, that'll rounds towards zero, so you .5 offset for positive and -.5 offset for negative values). I would argue that getting the float->integer conversion right (and consistent with whatever method you use for integer->float) is a whole lot more important than the exact scaling rule you choose.

mystran · Post by **mystran** » Fri Jun 27, 2014 4:10 pm

camsr wrote:Is there an accepted method on how to handle the negative zero?

You should almost always treat positive and negative zeroes as equivalent. Basically they represent the same value, just with a bit of extra information about how that value was reached (and to be honest, I'm not sure if it's really safe to even rely on negative vs. positive zeroes in various situations.. for multiply/divide you generally get the xor of the two sign-bits, but beyond that you could get anything).

JCJR · Post by **JCJR** » Fri Jun 27, 2014 10:37 pm

otristan wrote:http://blog.bjornroche.com/2009/12/int- ... there.html

Thanks for posting that, interesting article.

The option 4 is interesting, especially if one were assured that a stream could never contain overs--

4)
(integer>0?integer/0x7FFF:integer/0x8000)
float>0?float*0x7FFF:float*0x8000
Up to at least 24-bit
At least one high end DSP and A/D/A manufacturer.2,4 XO Wave 1.0.3.

Sacrifice just the tiniest amount of linearity, to avoid having to clip overs. But if one has no guarantee that streams would be free of overs, the clipping would have to be performed anyway?

BachRules · Post by **BachRules** » Fri Jun 27, 2014 10:40 pm

JCJR wrote:
otristan wrote:http://blog.bjornroche.com/2009/12/int- ... there.html
Thanks for posting that, interesting article.

+1

How to translate PCM to floating-point