Re: Arithmetic issues - feedback

This page is part of the web mail archives of SRFI 77 from before July 7th, 2015. The new archives for SRFI 77 contain all messages, not just those from before July 7th, 2015.

```For what it's worth, I've trawled through threads on
lispy mathematics in usenet archives for both cls and
cll, and I think that I've gotten a good idea of
what produces most of the complaint-generating math bugs.

Most math errors happen because information is lost
when one numeric format is converted or coerced to another.
These are of three main kinds:

First, there is the well-known problematic property of
binary-mantissa and binary-exponent representation that
no decimal fraction can be exactly represented. This is
of course is shared with virtually all languages except
those (Ada I think, and maybe COBOL) which insist on
exact decimal fractions in their spec.  Someone enters
one fraction in their source code, the system converts
it and proceeds with a slightly different fraction in
its executable, and hilarity ensues.

Second, it frequently happens that the exact numbers
(bignums) have a wider range than the inexact numbers,
which results in calculations that proceed successfully
to a correct conclusion for rational arguments and yet
"Explode" for mantissa-and-exponent arguments of exactly
the same value, returning wrong answers because the
representational range has been exceeded in some
intermediate computation.  As an exercise to the reader,
it's easy to write a function that will, in a bignums-
and-floats representation, give

(fn 3/4) => 2 and
(fn 0.75) => #inf

And such functions, accidentally arrived at, baffle and
confuse a great many newbies.

Third, sign asymmetry is often surprising.  If +x is a
number, people expect -x to be a number.

Fourth, In Lisps supporting both a floating-point format
and a rational format, roundoff errors when a number is
converted from one to the other baffle and surprise
a lot of people.  Many such conversions happen to avoid
a performance error, to wit:

Most behavior errors in lisps with infinite-precision
rationals occur because people use calculations (iterative
refinement, etc) that generate rationals whose
representation takes exponentially increasing amounts of
space and as a result consume exponentially increasing
memory space and cpu time.

My immediate conclusion from looking at this list and R5RS
(yours may be different, and probably will) is to infer
that in order to avoid as many unexpected errors as possible,
I should:

1) Represent powers of ten exactly, indicating a numeric format
with an exponent-of-five as well as an exponent-of-two.

2) Assure that every exact number has a corresponding inexact
number of precisely the same value to prevent range and
conversion errors.

3) Eschew infinite-precision rationals as a default, instead
giving numbers a large(ish) "denominator" that will allow
exact rationals up to a point the user considers "useful"
and convert to "inexact" when that point is exceeded.

So my numeric format would become, (E, S, N, D, X, Y) where
E and S are one bit each, N has as many bits as we think
necessary, D is probably no more than 8 words long but possibly
selected by a command-line switch or (numeric-mode ...) syntax,
and X and Y are probably about two biased words each.  The
numeric value is N/D * 2^(X-bias) * 5^(Y-bias), with sign bit
S and exactness bit E.  And I can do math without type dispatch,
because all my numbers are the same type.  So, unexectedly,
if I can be clever enough with my basic operations to avoid
cache stalls and conditionals, this isn't even a big performance
hit.

The operations expressed as fx+, fx-, and so on, would be done
by first examining the arguments' exactness bits to make sure
they were set, then testing to make sure the denominator is
exactly one, then making sure that (- X bias) and (- Y bias)
are nonnegative, and throwing an error should any of those
conditions fail.  Next, for each argument, I'd need to multiply
N by powers of 2 and 5 to cancel out the exponents leaving
(- X bias) = 0 and (- Y bias) = 0, throwing a different error
if this caused N to become larger than some fixed limit.
Next I'd perform the usual operation on the altered versions of
the arguments, doing a modulus operation to bring N below the
fixed limit again if necessary, and returning that result.
Right?

The operations expressed as fl+, fl-, and so on, would be done
similarly, but with more flexibility regarding X (the power of
2).  Oh, and I'd have to add code to explicitly fail if one of
the arguments happened to be exact, and coerce the result to
inexact regardless of whether an exact answer were returned
from the generic procedure. Right?

Bear

(the point of this exercise is to demonstrate that the
operations here requested are not necessarily "primitive"
and that some semi-reasonable design decisions will have
to jump through hoops in order to dumb themselves down
enough to meet the arbitrary limits they impose).

```