Title

Sorting and Merging

Author

Aubrey Jaffer

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-95@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2006-10-11
Draft: 2006-10-17--2006-12-16
Revised: 2006-11-06
Revised: 2006-11-23
Final: 2007-01-29
Revised to fix errata:
- 2020-03-07 (Fix bug in sample implementation.)

Abstract

Sorting and Merging are useful operations deserving a common API.

Issues

Changed so that the algorithms are required to call key arguments no more than once per element.
As a consequence, the pair allocation constraints for sort! and merge! are removed.
These procedures are stable only for less? predicates which return #f when applied to identical arguments. With non-empty sequence arguments, less? can easily be tested. Should these procedures signal an error when given reflexive predicates? Should they silently replace less? with
```
(lambda (a b) (not (less? b a)))
```

Rationale

General purpose software libraries are about simplicity and ease of use, not theoretical perfection in algorithm design. A sorting library should be specified so that its routines will perform moderately well for moderately sized inputs in the vast majority of applications.

When SRFI 32 Sort Libraries was withdrawn, it had 28 procedures. Having many variants in a general-purpose sorting library has disadvantages:

When there are only a few paths through the code, the code gets thoroughly tested and its behavior well understood. When there are many paths, most of the code is not well tested and not well understood.
To choose optimal sort algorithms requires nearly as much understanding as to write them. Most users don't.
A module with too many functions and voluminous documentation scares off the typical user looking to just sort a 50-element list; who then goes searching for any old sort algorithm to reinvent the wheel.
If some of the default sorts are unstable, then users will be surprised that sorting their data twice results in different orders; or that vector and list sorts return different orders.

The table in Wikipedia Sorting algorithm, shows that the merge-sort class of algorithms is optimal in terms of space and time asymptotic behavior except for the best case time, which is obtained at the expense of making the sort unstable.

This SRFI's sort procedures operate on lists and arrays, which includes vectors; the merge procedures operate on lists.

SRFI 32's vector routines took optional arguments to restrict their operations to a subrange of the vector. SRFI 63 shared subarrays (using make-shared-array or SLIB's subarray) eliminate the need for these optional arguments.

The present SRFI procedures take an optional procedure argument equivalent to Common-Lisp's &KEY argument.

Specification

These procedures are stable when called with predicates which return #f when applied to identical arguments.

The sorted?, merge, and merge! procedures consume asymptotic time and space no larger than O(N), where N is the sum of the lengths of the sequence arguments. The sort and sort! procedures consume asymptotic time and space no larger than O(N*log(N)), where N is the length of the sequence argument.

All five functions take an optional key argument corresponding to a CL-style `&key' argument. A less? predicate with a key argument behaves like:

(lambda (x y) (less? (key x) (key y)))

All five functions will call the key argument at most once per element.

The `!' variants sort in place; sort! returns its sequence argument.

Function: sorted? sequence less?
Function: sorted? sequence less? key: Returns #t when the sequence argument is in non-decreasing order according to less? (that is, there is no adjacent pair ... x y ... for which (less? y x)).
Returns #f when the sequence contains at least one out-of-order pair. It is an error if the sequence is not a list or array (including vectors and strings).

Function: merge list1 list2 less?
Function: merge list1 list2 less? key: Merges two sorted lists, returning a freshly allocated list as its result.

Function: merge! list1 list2 less?
Function: merge! list1 list2 less? key: Merges two sorted lists, re-using the pairs of list1 and list2 to build the result. The result will be either list1 or list2.

Function: sort sequence less?

Function: sort sequence less? key

Accepts a list or array (including vectors and strings) for sequence; and returns a completely new sequence which is sorted according to less?. The returned sequence is the same type as the argument sequence. Given valid arguments, it is always the case that:

(sorted? (sort sequence less?) less?) => #t

Function: sort! sequence less?

Function: sort! sequence less? key

Returns list, array, vector, or string sequence which has been mutated to order its elements according to less?. Given valid arguments, it is always the case that:

(sorted? (sort! sequence less?) less?) => #t

Implementation

slib/sort.scm implements procedures for R4RS or R5RS compliant Scheme implementations with arrays as implemented by slib/array.scm or SRFI 63.

;;; "sort.scm" Defines: sorted?, merge, merge!, sort, sort!
;;; Author : Richard A. O'Keefe (based on Prolog code by D.H.D.Warren)
;;;
;;; This code is in the public domain.

;;; Updated: 11 June 1991
;;; Modified for scheme library: Aubrey Jaffer 19 Sept. 1991
;;; Updated: 19 June 1995
;;; (sort, sort!, sorted?): Generalized to strings by jaffer: 2003-09-09
;;; (sort, sort!, sorted?): Generalized to arrays by jaffer: 2003-10-04
;;; jaffer: 2006-10-08:
;;; (sort, sort!, sorted?, merge, merge!): Added optional KEY argument.
;;; jaffer: 2006-11-05:
;;; (sorted?, merge, merge!, sort, sort!): Call KEY arg at most once
;;; per element.

(require 'array)

;;; (sorted? sequence less?)
;;; is true when sequence is a list (x0 x1 ... xm) or a vector #(x0 ... xm)
;;; such that for all 1 <= i <= m,
;;;     (not (less? (list-ref list i) (list-ref list (- i 1)))).
;@
(define (sorted? seq less? . opt-key)
  (define key (if (null? opt-key) identity (car opt-key)))
  (cond ((null? seq) #t)
	((array? seq)
	 (let ((dimax (+ -1 (car (array-dimensions seq)))))
	   (or (<= dimax 1)
	       (let loop ((idx (+ -1 dimax))
			  (last (key (array-ref seq dimax))))
		 (or (negative? idx)
		     (let ((nxt (key (array-ref seq idx))))
                       (and (not (less? last nxt))
			    (loop (+ -1 idx) nxt))))))))
	((null? (cdr seq)) #t)
	(else
	 (let loop ((last (key (car seq)))
		    (next (cdr seq)))
	   (or (null? next)
	       (let ((nxt (key (car next))))
		 (and (not (less? nxt last))
		      (loop nxt (cdr next)))))))))

;;; (merge a b less?)
;;; takes two lists a and b such that (sorted? a less?) and (sorted? b less?)
;;; and returns a new list in which the elements of a and b have been stably
;;; interleaved so that (sorted? (merge a b less?) less?).
;;; Note:  this does _not_ accept arrays.  See below.
;@
(define (merge a b less? . opt-key)
  (define key (if (null? opt-key) identity (car opt-key)))
  (cond ((null? a) b)
	((null? b) a)
	(else
	 (let loop ((x (car a)) (kx (key (car a))) (a (cdr a))
		    (y (car b)) (ky (key (car b))) (b (cdr b)))
	   ;; The loop handles the merging of non-empty lists.  It has
	   ;; been written this way to save testing and car/cdring.
	   (if (less? ky kx)
	       (if (null? b)
		   (cons y (cons x a))
		   (cons y (loop x kx a (car b) (key (car b)) (cdr b))))
	       ;; x <= y
	       (if (null? a)
		   (cons x (cons y b))
		   (cons x (loop (car a) (key (car a)) (cdr a) y ky b))))))))

(define (sort:merge! a b less? key)
  (define (loop r a kcara b kcarb)
    (cond ((less? kcarb kcara)
	   (set-cdr! r b)
	   (if (null? (cdr b))
	       (set-cdr! b a)
	       (loop b a kcara (cdr b) (key (cadr b)))))
	  (else				; (car a) <= (car b)
	   (set-cdr! r a)
	   (if (null? (cdr a))
	       (set-cdr! a b)
	       (loop a (cdr a) (key (cadr a)) b kcarb)))))
  (cond ((null? a) b)
	((null? b) a)
	(else
	 (let ((kcara (key (car a)))
	       (kcarb (key (car b))))
	   (cond
	    ((less? kcarb kcara)
	     (if (null? (cdr b))
		 (set-cdr! b a)
		 (loop b a kcara (cdr b) (key (cadr b))))
	     b)
	    (else			; (car a) <= (car b)
	     (if (null? (cdr a))
		 (set-cdr! a b)
		 (loop a (cdr a) (key (cadr a)) b kcarb))
	     a))))))

;;; takes two sorted lists a and b and smashes their cdr fields to form a
;;; single sorted list including the elements of both.
;;; Note:  this does _not_ accept arrays.
;@
(define (merge! a b less? . opt-key)
  (sort:merge! a b less? (if (null? opt-key) identity (car opt-key))))

(define (sort:sort-list! seq less? key)
  (define keyer (if key car identity))
  (define (step n)
    (cond ((> n 2) (let* ((j (quotient n 2))
			  (a (step j))
			  (k (- n j))
			  (b (step k)))
		     (sort:merge! a b less? keyer)))
	  ((= n 2) (let ((x (car seq))
			 (y (cadr seq))
			 (p seq))
		     (set! seq (cddr seq))
		     (cond ((less? (keyer y) (keyer x))
			    (set-car! p y)
			    (set-car! (cdr p) x)))
		     (set-cdr! (cdr p) '())
		     p))
	  ((= n 1) (let ((p seq))
		     (set! seq (cdr seq))
		     (set-cdr! p '())
		     p))
	  (else '())))
  (define (key-wrap! lst)
    (cond ((null? lst))
	  (else (set-car! lst (cons (key (car lst)) (car lst)))
		(key-wrap! (cdr lst)))))
  (define (key-unwrap! lst)
    (cond ((null? lst))
	  (else (set-car! lst (cdar lst))
		(key-unwrap! (cdr lst)))))
  (cond (key
	 (key-wrap! seq)
	 (set! seq (step (length seq)))
	 (key-unwrap! seq)
	 seq)
	(else
	 (step (length seq)))))

(define (rank-1-array->list array)
  (define dimensions (array-dimensions array))
  (do ((idx (+ -1 (car dimensions)) (+ -1 idx))
       (lst '() (cons (array-ref array idx) lst)))
      ((< idx 0) lst)))

;;; (sort! sequence less?)
;;; sorts the list, array, or string sequence destructively.  It uses
;;; a version of merge-sort invented, to the best of my knowledge, by
;;; David H. D.  Warren, and first used in the DEC-10 Prolog system.
;;; R. A. O'Keefe adapted it to work destructively in Scheme.
;;; A. Jaffer modified to always return the original list.
;@
(define (sort! seq less? . opt-key)
  (define key (if (null? opt-key) #f (car opt-key)))
  (cond ((array? seq)
	 (let ((dims (array-dimensions seq)))
	   (do ((sorted (sort:sort-list! (rank-1-array->list seq) less? key)
			(cdr sorted))
		(i 0 (+ i 1)))
	       ((null? sorted) seq)
	     (array-set! seq (car sorted) i))))
	(else			      ; otherwise, assume it is a list
	 (let ((ret (sort:sort-list! seq less? key)))
	   (if (not (eq? ret seq))
	       (do ((crt ret (cdr crt)))
		   ((eq? (cdr crt) seq)
		    (set-cdr! crt ret)
		    (let ((scar (car seq)) (scdr (cdr seq)))
		      (set-car! seq (car ret)) (set-cdr! seq (cdr ret))
		      (set-car! ret scar) (set-cdr! ret scdr)))))
	   seq))))

;;; (sort sequence less?)
;;; sorts a array, string, or list non-destructively.  It does this
;;; by sorting a copy of the sequence.  My understanding is that the
;;; Standard says that the result of append is always "newly
;;; allocated" except for sharing structure with "the last argument",
;;; so (append x '()) ought to be a standard way of copying a list x.
;@
(define (sort seq less? . opt-key)
  (define key (if (null? opt-key) #f (car opt-key)))
  (cond ((array? seq)
	 (let ((dims (array-dimensions seq)))
	   (define newra (apply make-array seq dims))
	   (do ((sorted (sort:sort-list! (rank-1-array->list seq) less? key)
			(cdr sorted))
		(i 0 (+ i 1)))
	       ((null? sorted) newra)
	     (array-set! newra (car sorted) i))))
	(else (sort:sort-list! (append seq '()) less? key))))

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: David Van Horn