[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: arithmetic issues

 | Date: Sat, 22 Oct 2005 20:39:01 -0500
 | From: Alan Watson <a.watson@xxxxxxxxxxxxxxxx>
 | Aubrey Jaffer wrote:
 | >  | > Flonums often are the most difficult feature to port to new
 | >  | > architectures.  
 | >  | 
 | >  | Why do you say that?
 | > 
 | >From the experience of porting SCM to dozens of C compilers.  
 | Okay, when you said "architecture" I thought you refered to CPU
 | instructions and data formats. Yes, compilers are a pain and there
 | are frequent bugs in the standard library.

Sorry for the confusion; platform would have been a better word.

 | >  | That is, I would mandate only unlimited size integers in the
 | >  | core.  The rest of the tower should be moved to the library.
 | ...
 | I would distinguish "moving the rest of the tower to a library in
 | the *language* *definition*" and "moving the rest of the tower to a
 | library in an *implementation*".
 | That is, moving all but exact integers out of the core of the
 | language definition simplifies the language definition and keeps it
 | grounded in things that are generally agreed to be correct.

I agree; exact integers are certainly the best basis for formal

 | However, that does not prohibit implementors from including
 | important aspects of other numbers in the core of their
 | implementation. For example, the core of their implementation could
 | have representation and garbage collection for flonums, ratnums,
 | and whatever else. You would probably end up duplicating some
 | arithmetic routines, but that's about it.
 | > [In SCM] The arithmetic subrs test first for INUMs, then bignums,
 | > then flonums.  The type dispatch for bignums and flonums is very
 | > similar.  It would be good to find what causes the difference.
 | This is my point.  The branches for inums and bignums are probably
 | predicted as taken.  Thus, when you use these generic operators on
 | flonums, you incur two mispredicted branches.  flonum-specific
 | operators would save those.

Point taken.

 |  > I tested SCM and SCMLIT (fixnums only), both compiled with gcc
 |  > -O3, computing 2000 digits of pi 4 digits at a time on a Pentium
 |  > 4 3.00GHz.  The benchmark uses only small integers.
 |  >
 |  > SCM took 5330.ms, while SCMLIT took 3330.ms, a substantial
 |  > savings.
 | However, your results suggest to me that perhaps some of the
 | branches are not predicted as they should be. It might be worth
 | using the "__builtin_expect" feature of GCC to hint to the compiler
 | that numbers are expected to be inums.

I tried adding __builtin_expect() to several of the type dispatch
macros.  The one which sped up was used mainly for testing assertions.
It reduces the times to 4480.ms (SCM) and 2970.ms (integer-only).

 | Of course, type-specific operators are "just" a performance hack to
 | get around a lack of type analysis in many implementations. Hats
 | off to stalin.

There are some other possible uses.  Algebraic Petrofsky points out
that a real-only EXPT can do (expt -27 1/3) ==> -3, which R5RS EXPT