[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SRFI-77 with more than one flonum representation
John Cowan asked me to justify why I want support for different
precision floating-point numbers in Scheme. I set out to explain the
problem I am trying to solve and why I thought I needed single-precision
arithmentic, but in the end I have managed to convince myself that there
is a better solution.
First of all, I agree that single-precision and double-precision
arithmetic are now essentially indisinguishably similar in speed.
Indeed, when I write scalar code, I use double precision.
However, I also write a fair bit of code that essentially takes vectors
of single-precision arguments, performs scalar arithmetic on
corresponding members, and produces a vector of single-precision
results. The vectors are single precision either because this is
external data in a specified format or to be more compact in memory.
Now, can we agree here that my reasons for using single-precision
vectors for arguments and results are valid? (I concede that such
vectors are beyond the scope of SRFI 77.) If we can, then we can focus
on the scalar calculation that takes the arguments and produces the result.
Let us suppose that the result calculated using single-precision
operations on some specific arguments is S, that the result calculated
using double-precision operations on the same arguments (converted to
double precision) is D, and that the result of truncating D to single
precision and converting back to double precision is T.
The problem is that there are relations that are satisfied by S or T but
not by D. For example, if I multiply the largest finite single-precision
number by 2 in single precision, I get Inf, but if I do so in double
precision I get a finite number. If I then truncate this finite number
to single precision and convert back to double precision, I get a Inf.
There are similar problems with inequalities that are satisfied by S or
T but not D or vice versa because of the extra precision in D.
So, for example, I could calculate D, and check that it satisfies a
certain relation or inequality, write it to the single-precision vector,
read it back in, giving T, and then find that it did not satisfy the
same relation or inequality. This can lead to real bugs that are very
difficult to find, believe me.[*]
The solutions to this problem are to allow me to calculate either S or
T. To calculate S, I need the ability to specify that certain operations
should be carried out in single-precision. To calculate T, I need an
explicit "convert to single-precision and back to double-precision"
In practice, calculating T is less convenient for programers; they have
to remember to do the explicit conversion. However, it is much less of a
burden on implementors than forcing them to provide single-precision
arithmetic. Furthermore, it can be accomodated quite naturally even in
Schemes that do everything in double precision.
In the context of SRFI 77, it would mean adding a function to "convert a
flonum to a single-precision format appropriate for the implementation
and then back to a flonum". I suppose SRFI 77 should also define these
for double-precision too, in case the default flonum precision is
extended double, as it might be on some processors.
Does this seem like a reasonable compromise? It gives the programmer a
means to use fast SRFI 77 flonum arithmetic but maintain some control of
the precision of results. It asks the implementor to implement two
simple functions in addition to those already in SRFI 77.
[*] One can make the argument that this "excess precision" problem is
esoteric and not worth dealing with in Scheme. I would suggest that it
is real, especially since I have been bitten by it in a real program.
This is not a toy program, but one that has been used to produce
radiation transfer calculations for about a dozen papers. (It's
basically the program that got me tenure.) It's written in C, and had a
loop embedded in a loop. The same inequality appeared as part of the
inner loop termination criterion and as part of the outer loop
termination criterion. Despite all variable being declared as double,
the compiler decided to calculate one in long double precision and the
other in double precision. Thus, instead of both inequalities giving the
same result, there were cases where they could give different results.
This is legal C89. It took me days to track down and understand the bug,
days during which my grip on sanity was considerably loosened. I fixed
it by using a compiler switch that forced the compiler to truncate
temporary results to double precision.
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México