Floating point numbers are represented by non-computers (humans) in scientific notation (** represents raising to a power)
From these examples, it is apparent that a floating point number is represented using
The computer represents each of these signed numbers differently in a floating point number
For this paragraph, decimal digits will be used along with excess 49 notation for the exponent. This is done just to make the math a little easier. However, the format for the computer is structurally similar, but hex digits are used with excess 7FH notation.
Eight digits are used to represent a floating point number : two for the exponent and six for the mantissa. The sign of the mantissa will be represented as + or -, but in the computer it is represented by a bit: 1 means negative, 0 means positive.
Here are the above examples in the format recognized by the computer
This representation makes it easy to compare numbers. If two numbers have the same sign, then they can be compared numerically after the sign bit to determine which number is larger.
Things aren't quite as simple as the above paragraph would indicate. If the above format were followed, then 33 bits would be needed to represent a floating point number (1 bit for the sign, 4 bits for each hex digit). 33 is bad, 32 is good. So, how is the extra bit discarded? Through absolute trickery!
Actually, all of the precision of the above format is obtained, but it is accomplished using 32 bits instead of 33. The trick is to remember that in reality these numbers are stored in binary. Also, every number is always in NORMALIZED form, which means that it starts with a 1, not a 0. The exponent is always adjusted to eliminate any leading 0's from the mantissa. So this is where the extra bit is squeezed in (or out). If EVERY number begins with a 1, then why store it in memory? Why not just have the program place a 1 at the beginning of every mantissa?
Using this trick, the layout of a number in the computer is
1 bit for the sign, 8 bits for the exponent, 23 bits for the mantissa
However, since the leading bit in the mantissa is never stored, then there are actually 24 bits for the mantissa. Pretty sneaky.
In base 10, a number like 0.123 represents
1/10 + 2/100 + 3/1000
What is the significance of the denominators 10, 100, 1000? They are the powers of the base (base 10). So, what would the number 0.101 represent in binary?
1/2 + 0/4 + 1/8 = 5/8
since the powers of two are 2, 4, 8.
There is another way to calculate this, just count the number of decimal places, and raise 2 to that power. Since there are three decimal places in this example, then the denominator is 2**3 = 8. Then, just calculate the numerator as a binary number, in this case 5. So the final number is 5/8.
Here are some more examples
101.1101 = 5 13/16 - 11101.11101 = - 29 29/32 0.001011 = 11/64
101.1101 normalized number = 1.011101 * 2**2 sign of mantissa = 0 mantissa = 011101 (leading 1 is not stored) excess 7FH exponent = 81H = 10000001 in binary Binary representation of number 0 10000001 01110100000000000000000 Regroup 0100 0000 1011 1010 0000 0000 0000 0000 Hex representation of number 40BA0000
- 11101.11101 normalized number = 1.110111101 * 2**4 sign of mantissa = 1 mantissa = 110111101 (leading 1 is not stored) excess 7FH exponent = 83H = 10000011 in binary Binary representation of number 1 10000011 110111101000000000000000 Regroup 1100 0001 1110 1111 0100 0000 0000 0000 Hex representation of number C1EF4000
0.001011 normalized number = 1.011 * 2**(-3) sign of mantissa = 0 mantissa = 011 (leading 1 is not stored) excess 7FH exponent = 7CH = 01111100 in binary Binary representation of number 0 01111100 01100000000000000000000 Regroup 0011 1110 0011 0000 0000 0000 0000 0000 Hex representation of number 3E300000
There are two standard formats for floating point numbers according to IEEE