[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
# Re: SRFI-77 with more than one flonum representation

`John Cowan asked me to justify why I want support for different
``precision floating-point numbers in Scheme. I set out to explain the
``problem I am trying to solve and why I thought I needed single-precision
``arithmentic, but in the end I have managed to convince myself that there
``is a better solution.
`

`First of all, I agree that single-precision and double-precision
``arithmetic are now essentially indisinguishably similar in speed.
``Indeed, when I write scalar code, I use double precision.
`

`However, I also write a fair bit of code that essentially takes vectors
``of single-precision arguments, performs scalar arithmetic on
``corresponding members, and produces a vector of single-precision
``results. The vectors are single precision either because this is
``external data in a specified format or to be more compact in memory.
`

`Now, can we agree here that my reasons for using single-precision
``vectors for arguments and results are valid? (I concede that such
``vectors are beyond the scope of SRFI 77.) If we can, then we can focus
``on the scalar calculation that takes the arguments and produces the result.
`

`Let us suppose that the result calculated using single-precision
``operations on some specific arguments is S, that the result calculated
``using double-precision operations on the same arguments (converted to
``double precision) is D, and that the result of truncating D to single
``precision and converting back to double precision is T.
`

`The problem is that there are relations that are satisfied by S or T but
``not by D. For example, if I multiply the largest finite single-precision
``number by 2 in single precision, I get Inf, but if I do so in double
``precision I get a finite number. If I then truncate this finite number
``to single precision and convert back to double precision, I get a Inf.
``There are similar problems with inequalities that are satisfied by S or
``T but not D or vice versa because of the extra precision in D.
`

`So, for example, I could calculate D, and check that it satisfies a
``certain relation or inequality, write it to the single-precision vector,
``read it back in, giving T, and then find that it did not satisfy the
``same relation or inequality. This can lead to real bugs that are very
``difficult to find, believe me.[*]
`

`The solutions to this problem are to allow me to calculate either S or
``T. To calculate S, I need the ability to specify that certain operations
``should be carried out in single-precision. To calculate T, I need an
``explicit "convert to single-precision and back to double-precision"
``operation.
`

`In practice, calculating T is less convenient for programers; they have
``to remember to do the explicit conversion. However, it is much less of a
``burden on implementors than forcing them to provide single-precision
``arithmetic. Furthermore, it can be accomodated quite naturally even in
``Schemes that do everything in double precision.
`

`In the context of SRFI 77, it would mean adding a function to "convert a
``flonum to a single-precision format appropriate for the implementation
``and then back to a flonum". I suppose SRFI 77 should also define these
``for double-precision too, in case the default flonum precision is
``extended double, as it might be on some processors.
`

`Does this seem like a reasonable compromise? It gives the programmer a
``means to use fast SRFI 77 flonum arithmetic but maintain some control of
``the precision of results. It asks the implementor to implement two
``simple functions in addition to those already in SRFI 77.
`
Regards,
Alan

`[*] One can make the argument that this "excess precision" problem is
``esoteric and not worth dealing with in Scheme. I would suggest that it
``is real, especially since I have been bitten by it in a real program.
``This is not a toy program, but one that has been used to produce
``radiation transfer calculations for about a dozen papers. (It's
``basically the program that got me tenure.) It's written in C, and had a
``loop embedded in a loop. The same inequality appeared as part of the
``inner loop termination criterion and as part of the outer loop
``termination criterion. Despite all variable being declared as double,
``the compiler decided to calculate one in long double precision and the
``other in double precision. Thus, instead of both inequalities giving the
``same result, there were cases where they could give different results.
``This is legal C89. It took me days to track down and understand the bug,
``days during which my grip on sanity was considerably loosened. I fixed
``it by using a compiler switch that forced the compiler to truncate
``temporary results to double precision.
`--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México