[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
# Multiple precisions of floating-point arithmetic

`Some floating-point applications need greater-than-64-bit-precision
``arithmetic; two are mentioned below.
`

`Perhaps this SRFI should tackle the problem of providing floating-
``point arithmetics of various precisions. If we think this might be
``needed, then the specially-named--operator approach for floating-
``point arithmetic as suggested in this SRFI (and which I like, by the
``way), does not seem to scale well.
`

`Common Lisp has an approach which is perhaps cumbersome to use
``properly and may be error prone, but it does allow for the
``implementation and use of differing precisions of floating-point
``arithmetic where they are useful.
`

`Or perhaps one could use the naming convention "name" (default double
``precision operation), "name"f (single-precision, 32-bit, operator),
``and "name"l (long double, whether 80 bit extended precision, 128-bit
``quad precision, or 128-bit pair-of-64-bit-doubles precision) for
``operations as is done in C if one wants to use the special-name
``approach.
`
Brad
Examples of effective use of 128-bit floating-point arithmetic:

`The following problem was pointed out by Philip W Sharp at the
``University of Auckland in a talk on the long-time simulation of the
``solar system.
`

`As computers get faster, round-off error accumulates more quickly,
``and, indeed, scientists are reaching the end of usefulness of 64-bit
``IEEE floating-point arithmetic for long-time simulations of the
``behavior of the solar system. There's a paper here that discusses
``this issue:
`
http://anziamj.austms.org.au/V46/CTAC2004/Gra2/home.html

`Basically, if you want to simulate the solar system for longer times
``you'll need an underlying arithmetic with more accuracy.
`

`Beyond using extended-precision arithmetic for accurate evaluation of
``the elementary functions, this was the first "real" application that
``I had heard of that needed more than 64-bit arithmetic.
`

`Then Colin Percival published his paper "Rapid multiplication modulo
``the sum and difference of highly composite numbers",
`

`www.ams.org/mcom/2003-72-241/S0025-5718-02-01419-9/
``S0025-5718-02-01419-9.pdf
`

`which gives new bounds for the error in FFTs implemented in floating-
``point arithmetic. This allows you to use FFTs to implement bignum
``arithmetic with inputs of size 256 * (1024)^2 bits in 64-bit IEEE
``arithmetic with proven accuracy. (Most codes for FFT bignum
``arithmetic use number-theoretic FFTs on finite fields.) This is not
``as big as some applications would like, but with 128-bit arithmetic
``(either so-called quad-precision with a 15 bit exponent and 113-bit
``mantissa or IBM-type long-double implemented as a pair of doubles (so
``with the same dynamic range as 64-bit IEEE arithmetic but with about
``106 bits of precision)), one could very easily implement fast,
``provably accurate bignum multiplication for sizes as big as one might
``ever need (and I don't think I'll live long enough to see that
``statement made false).
`

`I think that, given the effort and expense put into designing fast
``floating-point arithmetic units, bignum arithmetic built on floating-
``point FFTs will, in the end, be faster than the number theoretic FFTs
``now popular among the "really big bignum" folks.
`