How to translate PCM to floating-point
-
- Banned
- Topic Starter
- 228 posts since 3 Feb, 2014
I really tried Google for this, but can't find an answer.
Wondering how PCM is translated to floating-point. Specifically, for 16-bit PCM,
- How does the 16-bit PCM value 00000000 00000000 translate to floating-point? I'll guess it translates to 0.0.
- How does the 16-bit PCM value 10000000 00000000 translate to floating-point? Does it translate to +1.0, -1.0, or something else?
- Conversely, which 16-bit PCM values are used for representing +1.0 and -1.0?
I used to know this. Thanks for any help.
Wondering how PCM is translated to floating-point. Specifically, for 16-bit PCM,
- How does the 16-bit PCM value 00000000 00000000 translate to floating-point? I'll guess it translates to 0.0.
- How does the 16-bit PCM value 10000000 00000000 translate to floating-point? Does it translate to +1.0, -1.0, or something else?
- Conversely, which 16-bit PCM values are used for representing +1.0 and -1.0?
I used to know this. Thanks for any help.
If you criticize Spitfire Audio, the mods will lock the thread.
-
- KVRist
- 40 posts since 26 Feb, 2010
I use the following code to translate floating point to a 16-bit short:
const double sc16 = (double)0x7FFF + 0.4999999999999999;
wBuf= (short)((double)outputs[plug][0][x] * sc16);
const double sc16 = (double)0x7FFF + 0.4999999999999999;
wBuf= (short)((double)outputs[plug][0][x] * sc16);
-
- Banned
- Topic Starter
- 228 posts since 3 Feb, 2014
Thanks very much for that info. To figure out what it does on the limit cases I'm curious about, I'll have to review C++ (which I haven't coded for a decade), but I can find that info.DougCox wrote:I use the following code to translate floating point to a 16-bit short:
const double sc16 = (double)0x7FFF + 0.4999999999999999;
wBuf= (short)((double)outputs[plug][0][x] * sc16);
If you criticize Spitfire Audio, the mods will lock the thread.
-
- KVRAF
- 3080 posts since 17 Apr, 2005 from S.E. TN
For 16 bit signed to float, divide by 32768. For float to 16 bit, multiply by 32768.
Going from float to integer, do the initial conversion into a container bigger than the destination container so that it doesn't accidentally wrap on overs. For instance, on 16 or 24 bit, you can use signed int 32.
Or alternately, do the clipping on the scaled float value, before casting into integer. That way you can save the clipped float direct to a 16 bit container and not have to fool with an intermediate 32 bit container.
Clip the float to int results to stay in bounds. Anything lower than 0x8000 clipped to 0x8000, and anything higher than 0x7FFFF clipped to 0x7FFFF.
The advantage of using a factor of 32768 is that bit patterns will stay the same after conversion. If you use 32767, the bit patterns will change. Not much change, but it just doesn't seem elegant to have them change if it can be avoided.
If you want to do it as fast as possible it is just a matter of optimizing the code in whatever way seems sensible.
Going from float to integer, do the initial conversion into a container bigger than the destination container so that it doesn't accidentally wrap on overs. For instance, on 16 or 24 bit, you can use signed int 32.
Or alternately, do the clipping on the scaled float value, before casting into integer. That way you can save the clipped float direct to a 16 bit container and not have to fool with an intermediate 32 bit container.
Clip the float to int results to stay in bounds. Anything lower than 0x8000 clipped to 0x8000, and anything higher than 0x7FFFF clipped to 0x7FFFF.
The advantage of using a factor of 32768 is that bit patterns will stay the same after conversion. If you use 32767, the bit patterns will change. Not much change, but it just doesn't seem elegant to have them change if it can be avoided.
If you want to do it as fast as possible it is just a matter of optimizing the code in whatever way seems sensible.
-
- Banned
- Topic Starter
- 228 posts since 3 Feb, 2014
Thanks, I appreciate it.JCJR wrote:For 16 bit signed to float, divide by 32768. For float to 16 bit, multiply by 32768.
Going from float to integer, do the initial conversion into a container bigger than the destination container so that it doesn't accidentally wrap on overs. For instance, on 16 or 24 bit, you can use signed int 32.
Or alternately, do the clipping on the scaled float value, before casting into integer. That way you can save the clipped float direct to a 16 bit container and not have to fool with an intermediate 32 bit container.
Clip the float to int results to stay in bounds. Anything lower than 0x8000 clipped to 0x8000, and anything higher than 0x7FFFF clipped to 0x7FFFF.
The advantage of using a factor of 32768 is that bit patterns will stay the same after conversion. If you use 32767, the bit patterns will change. Not much change, but it just doesn't seem elegant to have them change if it can be avoided.
If you want to do it as fast as possible it is just a matter of optimizing the code in whatever way seems sensible.
If you criticize Spitfire Audio, the mods will lock the thread.
-
- Banned
- Topic Starter
- 228 posts since 3 Feb, 2014
Thank you too. So there are options here, not a standard. I wasn't sure about that before I asked.FLWrd wrote:The other option is of course: out = (float) in; Your scale will be -32768..+32767, and absolutely all the information is there.
This makes me wonder, what if a 32-bit FP WAV file contains the value +1.0? That would seem not to be a clip if you import it into a system which multiples by 32767 to convert to 16-bit PCM. But if you import it into a system which multiples by 32768 to convert to 16-bit PCM, then the +1.0 will clip. So there's no standard defining what represents a clip in a 32-bit FP WAV file? Or +1.0 is always considered a clip?
If you criticize Spitfire Audio, the mods will lock the thread.
-
- KVRAF
- 3080 posts since 17 Apr, 2005 from S.E. TN
There are various opinions, but the very nature of signed ints, 2's complement arithmetic, is that the negative values can always count one more than the positive values.
The internal representation inside a float follows similar rules in the mantissa. So for instance a signed 24 bit int, can have the same bit pattern in a 32 bit float. No loss or change. In the float, if you increment the mantissa one higher than the max positive number, it changes the bit pattern and increments the exponent.
So if you use normalized signals -1.0 to +1.0 in float, there is VERY minor clipping when you convert to int.
The normalized float range is convenient merely because that is what most people use, but you can use any range you like. Also, using the normalized range, your float dsp math works the same regardless the resolution of the source and dest streams.
If you decide to use -32768 to +32767 range in float, then you would still have to convert if the in/out streams happen to be 8 bit, 12 bit, 24 bit, 32 bit, whatever. Or alternately, you could modify your dsp math to different levels for every in/out format case, which would be messy.
Even if you decided to use a symmetrical float range of -32768 to +32768 in your float calculations, you would still have to clip when converting back to SInt16.
The internal representation inside a float follows similar rules in the mantissa. So for instance a signed 24 bit int, can have the same bit pattern in a 32 bit float. No loss or change. In the float, if you increment the mantissa one higher than the max positive number, it changes the bit pattern and increments the exponent.
So if you use normalized signals -1.0 to +1.0 in float, there is VERY minor clipping when you convert to int.
The normalized float range is convenient merely because that is what most people use, but you can use any range you like. Also, using the normalized range, your float dsp math works the same regardless the resolution of the source and dest streams.
If you decide to use -32768 to +32767 range in float, then you would still have to convert if the in/out streams happen to be 8 bit, 12 bit, 24 bit, 32 bit, whatever. Or alternately, you could modify your dsp math to different levels for every in/out format case, which would be messy.
Even if you decided to use a symmetrical float range of -32768 to +32768 in your float calculations, you would still have to clip when converting back to SInt16.
-
- Banned
- Topic Starter
- 228 posts since 3 Feb, 2014
Thanks again.JCJR wrote:There are various opinions, but the very nature of signed ints, 2's complement arithmetic, is that the negative values can always count one more than the positive values.
The internal representation inside a float follows similar rules in the mantissa. So for instance a signed 24 bit int, can have the same bit pattern in a 32 bit float. No loss or change. In the float, if you increment the mantissa one higher than the max positive number, it changes the bit pattern and increments the exponent.
So if you use normalized signals -1.0 to +1.0 in float, there is VERY minor clipping when you convert to int.
The normalized float range is convenient merely because that is what most people use, but you can use any range you like. Also, using the normalized range, your float dsp math works the same regardless the resolution of the source and dest streams.
If you decide to use -32768 to +32767 range in float, then you would still have to convert if the in/out streams happen to be 8 bit, 12 bit, 24 bit, 32 bit, whatever. Or alternately, you could modify your dsp math to different levels for every in/out format case, which would be messy.
Even if you decided to use a symmetrical float range of -32768 to +32768 in your float calculations, you would still have to clip when converting back to SInt16.
If you criticize Spitfire Audio, the mods will lock the thread.
-
- KVRAF
- 2393 posts since 28 Mar, 2005
- KVRAF
- 7897 posts since 12 Feb, 2006 from Helsinki, Finland
I'd say the most important thing really is to be internally consistent. I'd also keep zeroes as zeroes and all floating point intervals for different integer values the same size (well, for linear PCM), but really as long as less than half an integer bit of additive floating point noise doesn't have an effect on the loss-less property of int->float->int (or equivalently, as long as a freshly converted float value is midway inside the interval of values represented by the same integer value) your scheme is probably as good as any other.
In other words: to convert from integers to float, do the conversion, then divide by whatever factor you want (eg 0x7fff or 0x8000, the minor advantage of the latter is that it's power-of-two for exact float multiply/divide). Then to convert from float to integers, multiply by the same scale factor and round to nearest (note that if you round with integer cast, that'll rounds towards zero, so you .5 offset for positive and -.5 offset for negative values). I would argue that getting the float->integer conversion right (and consistent with whatever method you use for integer->float) is a whole lot more important than the exact scaling rule you choose.
In other words: to convert from integers to float, do the conversion, then divide by whatever factor you want (eg 0x7fff or 0x8000, the minor advantage of the latter is that it's power-of-two for exact float multiply/divide). Then to convert from float to integers, multiply by the same scale factor and round to nearest (note that if you round with integer cast, that'll rounds towards zero, so you .5 offset for positive and -.5 offset for negative values). I would argue that getting the float->integer conversion right (and consistent with whatever method you use for integer->float) is a whole lot more important than the exact scaling rule you choose.
- KVRAF
- 7897 posts since 12 Feb, 2006 from Helsinki, Finland
You should almost always treat positive and negative zeroes as equivalent. Basically they represent the same value, just with a bit of extra information about how that value was reached (and to be honest, I'm not sure if it's really safe to even rely on negative vs. positive zeroes in various situations.. for multiply/divide you generally get the xor of the two sign-bits, but beyond that you could get anything).camsr wrote:Is there an accepted method on how to handle the negative zero?
-
- KVRAF
- 3080 posts since 17 Apr, 2005 from S.E. TN
Thanks for posting that, interesting article.otristan wrote:http://blog.bjornroche.com/2009/12/int- ... there.html
The option 4 is interesting, especially if one were assured that a stream could never contain overs--
Sacrifice just the tiniest amount of linearity, to avoid having to clip overs. But if one has no guarantee that streams would be free of overs, the clipping would have to be performed anyway?4)
(integer>0?integer/0x7FFF:integer/0x8000)
float>0?float*0x7FFF:float*0x8000
Up to at least 24-bit
At least one high end DSP and A/D/A manufacturer.2,4 XO Wave 1.0.3.
Last edited by JCJR on Fri Jun 27, 2014 10:51 pm, edited 1 time in total.
-
- Banned
- Topic Starter
- 228 posts since 3 Feb, 2014
+1JCJR wrote:Thanks for posting that, interesting article.otristan wrote:http://blog.bjornroche.com/2009/12/int- ... there.html
If you criticize Spitfire Audio, the mods will lock the thread.