[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SRFI-77 with more than one flonum representation



John Cowan asked me to justify why I want support for different precision floating-point numbers in Scheme. I set out to explain the problem I am trying to solve and why I thought I needed single-precision arithmentic, but in the end I have managed to convince myself that there is a better solution.

First of all, I agree that single-precision and double-precision arithmetic are now essentially indisinguishably similar in speed. Indeed, when I write scalar code, I use double precision.

However, I also write a fair bit of code that essentially takes vectors of single-precision arguments, performs scalar arithmetic on corresponding members, and produces a vector of single-precision results. The vectors are single precision either because this is external data in a specified format or to be more compact in memory.

Now, can we agree here that my reasons for using single-precision vectors for arguments and results are valid? (I concede that such vectors are beyond the scope of SRFI 77.) If we can, then we can focus on the scalar calculation that takes the arguments and produces the result.

Let us suppose that the result calculated using single-precision operations on some specific arguments is S, that the result calculated using double-precision operations on the same arguments (converted to double precision) is D, and that the result of truncating D to single precision and converting back to double precision is T.

The problem is that there are relations that are satisfied by S or T but not by D. For example, if I multiply the largest finite single-precision number by 2 in single precision, I get Inf, but if I do so in double precision I get a finite number. If I then truncate this finite number to single precision and convert back to double precision, I get a Inf. There are similar problems with inequalities that are satisfied by S or T but not D or vice versa because of the extra precision in D.

So, for example, I could calculate D, and check that it satisfies a certain relation or inequality, write it to the single-precision vector, read it back in, giving T, and then find that it did not satisfy the same relation or inequality. This can lead to real bugs that are very difficult to find, believe me.[*]

The solutions to this problem are to allow me to calculate either S or T. To calculate S, I need the ability to specify that certain operations should be carried out in single-precision. To calculate T, I need an explicit "convert to single-precision and back to double-precision" operation.

In practice, calculating T is less convenient for programers; they have to remember to do the explicit conversion. However, it is much less of a burden on implementors than forcing them to provide single-precision arithmetic. Furthermore, it can be accomodated quite naturally even in Schemes that do everything in double precision.

In the context of SRFI 77, it would mean adding a function to "convert a flonum to a single-precision format appropriate for the implementation and then back to a flonum". I suppose SRFI 77 should also define these for double-precision too, in case the default flonum precision is extended double, as it might be on some processors.

Does this seem like a reasonable compromise? It gives the programmer a means to use fast SRFI 77 flonum arithmetic but maintain some control of the precision of results. It asks the implementor to implement two simple functions in addition to those already in SRFI 77.

Regards,

Alan

[*] One can make the argument that this "excess precision" problem is esoteric and not worth dealing with in Scheme. I would suggest that it is real, especially since I have been bitten by it in a real program. This is not a toy program, but one that has been used to produce radiation transfer calculations for about a dozen papers. (It's basically the program that got me tenure.) It's written in C, and had a loop embedded in a loop. The same inequality appeared as part of the inner loop termination criterion and as part of the outer loop termination criterion. Despite all variable being declared as double, the compiler decided to calculate one in long double precision and the other in double precision. Thus, instead of both inequalities giving the same result, there were cases where they could give different results. This is legal C89. It took me days to track down and understand the bug, days during which my grip on sanity was considerably loosened. I fixed it by using a compiler switch that forced the compiler to truncate temporary results to double precision.
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México