[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: arithmetic issues

This page is part of the web mail archives of SRFI 77 from before July 7th, 2015. The new archives for SRFI 77 contain all messages, not just those from before July 7th, 2015.



 | Date: Fri, 21 Oct 2005 19:26:19 -0500
 | From: Alan Watson <a.watson@xxxxxxxxxxxxxxxx>
 | 
 | Aubrey Jaffer wrote:
 | > Flonums often are the most difficult feature to port to new
 | > architectures.  But computer science is a half-centry old already --
 | > tiny implementations can say they implement a subset of R6RS.
 | 
 | Why do you say that?

From the experience of porting SCM to dozens of C compilers.  The
trouble is usually due to optimizers incorrectly analyzing code which
measures mantissa length or prints flonums.  We had to split some of
these routines into separate files so that the optimizer (which
sometimes couldn't be disabled) wouldn't screw it up.

 | IEEE singles and doubles (although unfortunately not extended
 | doubles) are part of all modern architectures and are implemented
 | transparently (i.e., in hardware or in software by the OS) almost
 | everywhere.  Denorms can be tricky, but it's normally just a case
 | of finding the right global flags.
 | 
 | IEEE singles and double are not difficult for an implementation,
 | but they are often not necessary for an application.  I would vote
 | to move them out of the core and into the library.
 | 
 | That is, I would mandate only unlimited size integers in the core.
 | The rest of the tower should be moved to the library.

Moving 1/3 or more of an implementation to a library isn't always
practical.  libm may not be dynamically loadable; in which case the
executable must carry around the math libraries, even when they are
not used.  One can end up having two copies of many subroutines and
some subsystems like garbage collection.

 | > Both Common-Lisp and SLIB implement MOST-POSITIVE-FIXNUM and
 | > MOST-NEGATIVE-FIXNUM, which would be good additions to R6RS.
 | 
 | If unlimited size integers are not mandated, these constants are
 | useful.  If they are, these constants are not useful.

There are times when integer calculations can be several sizes, but
one would like to pick the largest size that fits in fixnums.
Examples are:

* An odd or prime modulus for hashing;

* A product of small primes used for the initial GCD in factoring
  integers.

Each constant is computed once, but potentially used many times.  If
the number is a fixnum, then the routines using it will run more
quickly than if that number were a bignum.

In both of these examples having arithmetic-modulo-2^n is no help.

 | > In well-written interpreted implementations, fixnum operations
 | > would provide no speed increase.  SCM's type dispatch for
 | > immediate numbers is just 3 instructions: mask, test, and
 | > conditional jump.  With modern processors executing scores of
 | > instructions in the time it takes for a single cache line to be
 | > retrieved from memory, most data type dispatches use time which
 | > would otherwise be spent waiting for program or data caches.
 | 
 | What you write is correct as far as it goes, but imagine having two
 | number types (say fixnums and flonums) and comparing specific
 | operators for each type with a generic operator.  On at least some
 | processors, the generic operator will mispredict the branch for one
 | type or the other.  Mispredicted branches are often more painful
 | than L1 cache misses.  Specific operators can almost always manage
 | to avoid misdirected branches.

In SCM on most platforms, immediate numbers are coded #b110 in the
lower 3 bits.  The bignum and flonum branches would try to prefetch a
long integer from a non-long-aligned address.

I tested SCM and SCMLIT (fixnums only), both compiled with gcc -O3,
computing 2000 digits of pi 4 digits at a time on a Pentium 4 3.00GHz.
The benchmark uses only small integers.

SCM took 5330.ms, while SCMLIT took 3330.ms, a substantial savings.

I compiled an integer-only SCM (with bignums) and it took 3280.ms,
repeatably faster than SCMLIT!

With fixnums and flonums (no bignums) it takes 5470.ms.

The arithmetic subrs test first for INUMs, then bignums, then flonums.
The type dispatch for bignums and flonums is very similar.  It would
be good to find what causes the difference.

 | ...
 | > Mathematically, mixed exactness complex numbers makes no sense.
 | > Twisting the whole numeric tower around this artifice is wrong.
 | 
 | Maybe, but a cheap way to get an inexact imaginary is an number
 | with an exact zero for its real part and an inexact real for its
 | imaginary part.

It is relatively cheap in implementations which code complex +, -,
*, and / in Scheme.  In an implementation which performs these
computations in C on 16-byte structs it would add another level of
type dispatch, ballooning the source to unmanageable complexity.

Has anyone produced a compelling example of the utility of mixed-type
complexes other than for encoding reals?  Making real and complex be
disjoint types would accomplish the same type guarantees with a lot
less violence to at least 3 implementations.