The reciprocal of the smallest non-denormal (it has exp value 1 and mantissa value 0) has the exponent value 253 and mantissa value 0. The exponent range is [0,255]. We can observe that by having symmetric zero, there are an even number of possible exponent states. If you attempt to take the reciprocal of a float value larger than stated above, it will denormal. This is an asymmetric condition! From the 1.f value, the amount of exponent levels until the denormal range is -126, but it is +127 in the INF/NAN area.
Defining the value of exponent value 103, mantissa 0, as the smallest linear addend possible to reach 1.f while being accumulated was interesting. 103 + 24 = 127, the 1.f value has that exponent value. Accumulating this value 2^24 times will reach 1.f, but it won't go higher... this was KIND OF expected however I think it warrants more investigation. By adding small LSB offsets to the base and accumulator values I did notice some whacky results. I will just post this quick piece in case anyone wants to test what I was doing. Admittedly this test is not very robust, and I keep thinking of making some kind of function library to keep things organized.
Code: Select all
#include <stdio.h>
#include <stdint.h>
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i, j;
for (i = size-1; i >= 0; i--) {
for (j = 7; j >= 0; j--) {
byte = (b[i] >> j) & 1;
printf("%u", byte);
}
}
puts("");
}
int main(int argc, char* argv[])
{
//printBits(sizeof(f), &f);
// reciprocal of lowest normal
uint32_t n = 0b00000000100000000000000000000000;
printBits(4, &n);
float f = *(float*)&n;
printBits(4, &f);
float r = 1.f/f;
printBits(4, &r);
uint32_t s = *(uint32_t*)&r >> 23;
printf("%u\n", s);
float r1 = 1.f;
uint32_t s1 = *(uint32_t*)&r1 >> 23;
printf("%u\n", s1);
printf("%u\n", n>>23);
// accumulating small linear value
uint32_t e = 24 << 23;
float l = 1.f;
uint32_t li = *(uint32_t*)&l;
li = li - (24 << 23) + 0;
float la = *(float*)&li;
//li += (1<<2);
l = *(float*)&li;
printBits(4, (uint32_t*)&l);
printf("%.28f\n", l);
printf("%.28f\n", la);
int c = 0;
for(c=1;c<(1<<(24))+0;c++)
{
l += la;
}
//l *= 16777216.f;
printf("%.28f\n", l);
printBits(4, (uint32_t*)&l);
li = *(uint32_t*)&l;
printf("%u\n", li>>23);
return 0;
}