[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

3-way compares and short-circuit sorting

This page is part of the web mail archives of SRFI 32 from before July 7th, 2015. The new archives for SRFI 32 contain all messages, not just those from before July 7th, 2015.



Perhaps it doesn't belong in this srfi, and it doesn't seem very
Scheme-ish, but I was wondering if anyone had given thought to C-style
comparisons that yield positive, negative or zero on comparison (as in
strcmp)?  This lets you write efficient short-circuit tests such as:

  int pred (obj a, obj b) {
    return ((cheap_func(a) - cheap_func(b))
             || (expensive_func(a) - expensive_func(b)))
            < 0;
  }

where it is rare for the initial comparison to be zero.  As it is now,
you have to either sort twice or build a custom sort pred.  Building new
procedures is what Scheme is all about, so for the latter method I use
the following utility:

(define (chain-sort-predicate key1 pred<1 pred=1 key2 pred<2 . rest)
  (define (make-sort-predicate key pred)
    (if (eq? key identity)
      pred
      (lambda (a b) (pred (key a) (key b)))))
  (let ((secondary
         (if (null? rest)
           (make-sort-predicate key2 pred<2)
           (apply chain-sort-predicate (append (list key2 pred<2) rest)))))
    (lambda (a b)
      (let ((a1 (key1 a)) (b1 (key1 b)))
        (or (pred<1 a1 b1)
            (and (pred=1 a1 b1)
                 (secondary a b)))))))

which, without the sort-keys and variable-arity could just be

(define (chain-sort-predicate pred<1 pred=1 pred<2)
  (lambda (a b)
    (or (pred<1 a b)
        (and (pred=1 a b)
             (pred<2 a b)))))

which lets you do

   (sort data (chain-sort-predicate pred<1 pred=1 pred<2))

but the problem arises when pred<1 and pred=1 are not so cheap.  Usually
the steps needed to determine their value are the same, so it is a waste
to make the computation twice.

The solution in other languages is to define a general purpose
comparator, cmp in Python, <=> in Perl, which returns this trinary
value.  [This is also useful in general for OO-programming because you
can define just this method and you get <, >, =, et al for free on a
given class.]  Then you can sort with something-like

  (sort data (lambda (a b) (nzero-or (cmp1 a b) (cmp2 a b))))

and maybe define utilities to make that cleaner.

Anyway, just some rambling, thought I'd mention it...

-- 
Alex