IT Anawer: IEEE floating point representation issue

Suppose a hypothetical 6-bit floating point representation, with the fraction occupying 2 bits, and the exponent occupying 3 bits.

What is the biggest (except for ∞) number this 6-bit floating point representation can support? I read this from the book "Computer System: A programmer's perspective" on page P71. I guess this to be 28, but the book says it's 14.

The errata on the book's site didn't list this, but I can't figure out what I'm doing wrong in my reasoning.

From stackoverflow

The exponent is signed, so with 2 bits you get exponents ranging from -2 to 1. The mantissa can represent 0-7, so together you get max = 7*2^1 = 14

Jonathan Leffler : Fractions are normally less than one, and the exponent has 3 bits, though with a sign that leaves 2 bits to represent 0..3, I believe.

paxdiablo : For a start, the exponent is *three* bits, not two. And while your equation actually reached 14, it does so in a way that has little to do with IEEE754 representation :-)
Quoting from a mythical IEEE754 site:

The IEEE very small precision floating point standard representation requires a 6 bit word, which may be represented as numbered from 0 to 5, left to right. The first bit is the sign bit, S, the next three bits are the exponent bits, 'E', and the final two bits are the fraction 'F':
```
S EEE FF
0 1 3 45
```
The value V represented by the word may be determined as follows:
```
* If E=7 and F is nonzero, then V=NaN ("Not a number")
* If E=7 and F is zero and S is 1, then V=-Infinity
* If E=7 and F is zero and S is 0, then V=Infinity
* If 0<E<7 then V=(-1)^S * 2^(E-3) * (1.F) where "1.F"
  is intended to represent the binary number created by
  prefixing F with an implicit leading 1 and a binary point.
* If E=0 and F is nonzero, then V=(-1)^S * 2^(-2) * (0.F)
  These are "unnormalized" values.
* If E=0 and F is zero and S is 1, then V=-0
* If E=0 and F is zero and S is 0, then V=0
```
So you see that the maximum number you can have is the bit pattern "0 110 11":
```
v = -1^0 * 2^(6-3) * (1 + 1/2 + 1/4)
  =    1 *       8 * 1.75
  = 14
```
This description is actually paraphrased from here but adjusted for the different field sizes.

yfel : Thanks for help!I find what i did wrong..the normal case suppose E is not 000.. nor 111..,thus E(e-Bias) ranges [-2,3],not [-3,4],i forgoten minus 1...thus the result doubles,anyway,thanks a lot.

IT Anawer

Sunday, May 1, 2011

IEEE floating point representation issue

0 comments:

Post a Comment

Blog Archive