[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FP Hardware

 | Date: Thu, 19 May 2005 07:45:15 -0700 (PDT)
 | From: Noel Welsh <noelwelsh@xxxxxxxxx>
 | Aubrey wrote:
 | > Reducing the precision of scalar operands nets no speed 
 | > increase from floating-point hardware.
 | This is not the case in general.  The action in floating point
 | calculations is currently in the vector units (SSE2, AltiVec) found
 | in modern processors.  Indeed the Pentium 4s scalar floating point
 | unit is dismal, often making the vector unit the preferred path for
 | even scalar floating point code!  Vector units generally have fixed
 | size registers (e.g. 128-bits) meaning you can either achieve a 4x
 | speedup on single floats (32-bits) or 2x on doubles (64-bits).
 | What I've read of the Cell processor suggests it may only have
 | vectorised FP units.

The crucial word in my claim is *scalar*.  The big speedups for vector
calculations come with pipelined and parallel execution.  The size of
the operand will have a minor effect on the latencies throttling speed
when the vector units are data starved.

 | Hence allowing the user to specify precision seems like a good
 | move.

The homogeneous arrays of SRFI-63 do that.