[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SRFI-77 with more than one flonum representation



This is my second reply to John Cowan. In the first, I argue that my need to use single-precision floating point numbers is based on a desire for correctness, not efficiency. Here, I address efficiency.

John Cowan wrote:
I think the burden of persuasion now lies on you (or someone else in
your position) to show that:

1) there are still significant architectures in which different kinds
of floating-point numbers represent a significant tradeoff (as was
historically the case, single-float being faster but less precise and
with a smaller range), such that it does not make sense to privilege
one over the other; and that

2) this feature warrants support, even if halfhearted, from the Scheme
standard rather than being left as implementation-dependent.

I believe this will be a difficult burden to meet.

I present two examples. The first is a flight of fantasy, but an interesting one. The second is real, and something that could be implemented now.

1. Vectorizable code on an x86 or x86-64 with SSE and SSE2 can run twice as fast in single precision as double precision. To a large degree this is true regardless of the size of the vectors (i.e., it is true even for small vectors that are not limited by memory bandwidth).

I know of no vectorizing Scheme compiler, so it is difficult to argue that the Scheme standard should be bent to support such a hypothetical compiler. On the other hand, I do not think that the Scheme standard should be written in such a way that it is difficult to write an efficient vectorizing compiler that behaves naturally and can handle both single- and double-precision representations. So, sure, the standard should not require implementations to have more that one floating point representation, but it should not rule out that possibility either (i.e., please keep s, d, and l exponents).

2. An implementation on a 64-bit machine can probably represent single-precisions as unboxed types but would probably have to box double-precisions. This may well make single-precison arithmetic in Scheme faster than double-precision arithmetic in Scheme, even if both are equally fast at the hardware level. Of course, more sophisticated compilers may be able to unbox both types in some circumstances.

I know of no implementation that has unboxed singles, but 64-bit machines are here and now and this is an obvious optimization. To take full advantage of this, we would need a version of SRFI 77 that is specific to s-exponent numbers. Now, let us consider a standard that mandated a version of SRFI 77 that worked with s-exponent numbers and another that worked with e-exponent numbers. If the Scheme only uses one floating-point number, these will be identical. If the Scheme has unboxed single-precision numbers, the first version will be more efficient than the second, albeit at some cost in precision.

Is this second example at all convincing?

Regards,

Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México