[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gratuitous optimization and benchmarking

This page is part of the web mail archives of SRFI 86 from before July 7th, 2015. The new archives for SRFI 86 contain all messages, not just those from before July 7th, 2015.

 * From: "Taylor R. Campbell" <campbell@xxxxxxxxxx>
 | Any compiler can optimize a lambda argument to CWV(*), but the key is
 | that the lambda does not require a run-time, heap-allocated closure.
 | Joo ChurlSoo's tests here are very skewed in this way, because the
 | procedure argument to mu procedures must be heap-allocated in the
 | general case, whereas the arguments to CWV can *always* be optimized
 | (and stack-allocated, as any other continuations are) even in the
 | general case.

 | ((*) I find it very bizarre that there is a significant performance
 | difference between CWV & RECEIVE in Gauche, but that's a topic for a
 | different conversation.)

 | Soo's benchmarks do not test anything beyond procedure call vs
 | procedure return, the latter of which often has the overhead for
 | multiple return values anyway in order to optimize the path for the
 | single-value case in interpreters.  All of the systems that he tested
 | except Chez and Scheme48 use completely unoptimized VALUES & CWV
 | implementations, furthermore.

 | Now let's try a slightly different set of benchmarks.  This new code
 | elides the superfluous construction of a list, but builds many
 | closures, which are almost always needed in code that uses multiple
 | return values, so this benchmark more closely measures the performance
 | of real code (cough cough cough cough).  I also added another order of
 | magnitude to the number of iterations in the T tests, because ten
 | million went by too fast for accurate measurement.  In addition, I ran
 | the GC before starting the timer in all of the benchmarks.  I'd have
 | run each test ten or so times and taken the average time, but I got
 | bored waiting for Scheme48 and Gauche.  I got roughly consistent
 | answers from T, anyway.

 | All benchmarks were run on a 333 MHz UltraSPARC 5, although the
 | Scheme systems were all compiled as 32-bit programs.  No other users
 | on the machine, &c. &c., unless someone snuck on while I wasn't
 | looking.  I forget how much RAM it has.

        T 3.1 (19)      Scheme48 0.57   Gauche 0.8.6
 | m        36.91s         184.78s         210.86s
 | v        10.33s         151.11s          83.33s
 | mm       51.15s         258.25s         361.96s
 | vv       12.8s          189.54s         105.47s

 | Each of these tests calls the multiple-value-returning-entity with a
 | continuation that computes the sum of the iteration number and the
 | three values returned by the entity.  The procedures m, v, mm, and vv
 | are identical to the similarly named procedures in Soo's original
 | code.

 | I'd have tried more Scheme systems, but I don't have access to full
 | Chez, I can't build RScheme on this machine or figure out how to use
 | it on another, and unfortunately I can't find any other Scheme that
 | uses a saner representation of multiple values than a tagged list or
 | equivalent.  (Eli Barzilay informs me that MzScheme implements
 | multiple values better than a tagged list, but I wanted to sleep
 | tonight, so I didn't try to build it and run the tests with it.)

 | I have attached the source code, soo.scm and soo.t, to this message.

 | This goes to show that anyone can make benchmarks that match the
 | results they want.  I, of course, claim that *my* benchmarks are more
 | accurate to the truth than the other ones posted here, but YMMV.
 | Although I, of course, maintain that a proper system for multiple
 | return values is more efficient than MU & NU, the best conclusion to
 | draw is probably the absence of one, and I recommend omitting any
 | mention of performance from the rationale.

On REPL, the same tests are performed after the following are loaded.
I have neither full Chez nor Gauche. (Intel 586, Debian or Windows)

(define-syntax mu
  (syntax-rules ()
    ((mu argument ...) (lambda (f) (f argument ...)))))
(define-syntax receive
  (syntax-rules ()
    ((receive formals expression body ...)
     (call-with-values (lambda () expression)
                       (lambda formals body ...)))))

(define m (mu 1 2 3))
(define v (lambda () (values 1 2 3)))
(define mm (lambda (x)
             (if (< x 0)
                 (mu 1 2 3)
                 (if (= x 0)
                     (mu 4 5 6)
                     (mu 7 8 9)))))
(define vv (lambda (x)
             (if (< x 0)
                 (values 1 2 3)
                 (if (= x 0)
                     (values 4 5 6)
                     (values 7 8 9)))))

(define-syntax dotimes
  (syntax-rules ()
    ((dotimes (i count) body0 body1 ...)
     (let ((%COUNT count))
       (do ((i 0 (+ i 1)))
           ((= i %COUNT))

(define (test-m)
  (dotimes (i 10000000)
    (m (lambda (x y z)
         (+ i x y z)))))
(define (test-v)
  (dotimes (i 10000000)
    (receive (x y z) (v)
      (+ i x y z))))
(define (test-mm)
  (dotimes (i 10000000)
    ((mm i) (lambda (x y z)
              (+ i x y z)))))
(define (test-vv)
  (dotimes (i 10000000)
    (receive (x y z) (vv i)
      (+ i x y z))))

Welcome to Scheme 48 1.3 (made by soo on Mon Jul 11 09:51:36     2005)
Copyright (c) 1993-2005 by Richard Kelsey and Jonathan Rees.
Please report bugs to scheme-48-bugs@xxxxxxxx
Get more information at http://www.s48.org/.
Type ,? (comma question-mark) for help.
> ,time (test-m)
Run time: 37.96 seconds; Elapsed time: 40.22 seconds
> ,time (test-v)
Run time: 51.44 seconds; Elapsed time: 55.49 seconds
> ,time (test-mm)
Run time: 57.45 seconds; Elapsed time: 59.66 seconds
> ,time (test-vv)
Run time: 66.79 seconds; Elapsed time: 69.29 seconds

Welcome to MzScheme version 299.100, Copyright (c) 2004-2005 PLT Scheme, Inc.
> (time (test-m))
cpu time: 30314 real time: 32607 gc time: 6495
> (time (test-v))
cpu time: 53206 real time: 58714 gc time: 20065
> (time (test-mm))
cpu time: 41039 real time: 43092 gc time: 6681
> (time (test-vv))
cpu time: 62870 real time: 65364 gc time: 20040

Version 1, Build 66 - linux-unix-gnu-x86
(c)2000-2004 Felix L. Winkelmann
Using hygienic macros
; loading /usr/local/share/chicken/chicken-highlevel-macros.scm ...
#;13> (time (test-m))
  127.11 seconds elapsed
   71.13 seconds in (major) GC
       1 mutations
      50 minor GCs
    4169 major GCs
#;14> (time (test-v))
  163.51 seconds elapsed
    92.9 seconds in (major) GC
       1 mutations
      55 minor GCs
    5306 major GCs
#;15> (time (test-mm))
  211.26 seconds elapsed
  126.54 seconds in (major) GC
       1 mutations
      37 minor GCs
    7387 major GCs
#;16> (time (test-vv))
  225.75 seconds elapsed
  131.63 seconds in (major) GC
       1 mutations
      23 minor GCs
    7687 major GCs

I wonder why my results are different from yours.

Joo ChurlSoo