Sunday, May 1, 2011

IEEE floating point representation issue

Suppose a hypothetical 6-bit floating point representation, with the fraction occupying 2 bits, and the exponent occupying 3 bits.

What is the biggest (except for ∞) number this 6-bit floating point representation can support? I read this from the book "Computer System: A programmer's perspective" on page P71. I guess this to be 28, but the book says it's 14.

The errata on the book's site didn't list this, but I can't figure out what I'm doing wrong in my reasoning.

From stackoverflow
  • The exponent is signed, so with 2 bits you get exponents ranging from -2 to 1. The mantissa can represent 0-7, so together you get max = 7*2^1 = 14

    Jonathan Leffler : Fractions are normally less than one, and the exponent has 3 bits, though with a sign that leaves 2 bits to represent 0..3, I believe.
    paxdiablo : For a start, the exponent is *three* bits, not two. And while your equation actually reached 14, it does so in a way that has little to do with IEEE754 representation :-)
  • Quoting from a mythical IEEE754 site:

    The IEEE very small precision floating point standard representation requires a 6 bit word, which may be represented as numbered from 0 to 5, left to right. The first bit is the sign bit, S, the next three bits are the exponent bits, 'E', and the final two bits are the fraction 'F':

    S EEE FF
    0 1 3 45
    

    The value V represented by the word may be determined as follows:

    * If E=7 and F is nonzero, then V=NaN ("Not a number")
    * If E=7 and F is zero and S is 1, then V=-Infinity
    * If E=7 and F is zero and S is 0, then V=Infinity
    * If 0<E<7 then V=(-1)^S * 2^(E-3) * (1.F) where "1.F"
      is intended to represent the binary number created by
      prefixing F with an implicit leading 1 and a binary point.
    * If E=0 and F is nonzero, then V=(-1)^S * 2^(-2) * (0.F)
      These are "unnormalized" values.
    * If E=0 and F is zero and S is 1, then V=-0
    * If E=0 and F is zero and S is 0, then V=0
    

    So you see that the maximum number you can have is the bit pattern "0 110 11":

    v = -1^0 * 2^(6-3) * (1 + 1/2 + 1/4)
      =    1 *       8 * 1.75
      = 14
    

    This description is actually paraphrased from here but adjusted for the different field sizes.

    yfel : Thanks for help!I find what i did wrong..the normal case suppose E is not 000.. nor 111..,thus E(e-Bias) ranges [-2,3],not [-3,4],i forgoten minus 1...thus the result doubles,anyway,thanks a lot.

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.