[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
# Multiple precisions of floating-point arithmetic

This page is part of the web mail archives of SRFI 77 from before July 7th, 2015. The new archives for SRFI 77 contain all messages, not just those from before July 7th, 2015.

`Some floating-point applications need greater-than-64-bit-precision
``arithmetic; two are mentioned below.
`

`Perhaps this SRFI should tackle the problem of providing floating-
``point arithmetics of various precisions. If we think this might be
``needed, then the specially-named--operator approach for floating-
``point arithmetic as suggested in this SRFI (and which I like, by the
``way), does not seem to scale well.
`

`Common Lisp has an approach which is perhaps cumbersome to use
``properly and may be error prone, but it does allow for the
``implementation and use of differing precisions of floating-point
``arithmetic where they are useful.
`

`Or perhaps one could use the naming convention "name" (default double
``precision operation), "name"f (single-precision, 32-bit, operator),
``and "name"l (long double, whether 80 bit extended precision, 128-bit
``quad precision, or 128-bit pair-of-64-bit-doubles precision) for
``operations as is done in C if one wants to use the special-name
``approach.
`
Brad
Examples of effective use of 128-bit floating-point arithmetic:

`The following problem was pointed out by Philip W Sharp at the
``University of Auckland in a talk on the long-time simulation of the
``solar system.
`

`As computers get faster, round-off error accumulates more quickly,
``and, indeed, scientists are reaching the end of usefulness of 64-bit
``IEEE floating-point arithmetic for long-time simulations of the
``behavior of the solar system. There's a paper here that discusses
``this issue:
`
http://anziamj.austms.org.au/V46/CTAC2004/Gra2/home.html

`Basically, if you want to simulate the solar system for longer times
``you'll need an underlying arithmetic with more accuracy.
`

`Beyond using extended-precision arithmetic for accurate evaluation of
``the elementary functions, this was the first "real" application that
``I had heard of that needed more than 64-bit arithmetic.
`

`Then Colin Percival published his paper "Rapid multiplication modulo
``the sum and difference of highly composite numbers",
`

`www.ams.org/mcom/2003-72-241/S0025-5718-02-01419-9/
``S0025-5718-02-01419-9.pdf
`

`which gives new bounds for the error in FFTs implemented in floating-
``point arithmetic. This allows you to use FFTs to implement bignum
``arithmetic with inputs of size 256 * (1024)^2 bits in 64-bit IEEE
``arithmetic with proven accuracy. (Most codes for FFT bignum
``arithmetic use number-theoretic FFTs on finite fields.) This is not
``as big as some applications would like, but with 128-bit arithmetic
``(either so-called quad-precision with a 15 bit exponent and 113-bit
``mantissa or IBM-type long-double implemented as a pair of doubles (so
``with the same dynamic range as 64-bit IEEE arithmetic but with about
``106 bits of precision)), one could very easily implement fast,
``provably accurate bignum multiplication for sizes as big as one might
``ever need (and I don't think I'll live long enough to see that
``statement made false).
`

`I think that, given the effort and expense put into designing fast
``floating-point arithmetic units, bignum arithmetic built on floating-
``point FFTs will, in the end, be faster than the number theoretic FFTs
``now popular among the "really big bignum" folks.
`