Title

Sort Libraries

Author

Olin Shivers

Status

This SRFI is currently in withdrawn status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-32@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2002-07-12
Draft: 2002-07-15--2003-10-15
Draft extension: 2003-01-31
Withdrawn: 2003-07-17
Conversion to HTML: 2019-09-20

Abstract

Current Scheme sorting packages are, every one of them, surprisingly bad. I've designed the API for a full-featured sort toolkit, which I propose as a SRFI. The spec comes with 1200 lines of high-quality reference code: tightly written, highly commented, portable code, available for free. Implementors want this code. It's better than what you have.

Abstract
Procedure index
Introduction
What's wrong with the current state of affairs?
Design rules
- What vs. how
- Consistency across function signatures
- Data parameter first, less-than parameter after
- Ordering, comparison functions & stability
- All vector operations accept optional subrange parameters
- Required vs. allowed side-effects
Procedure specification
- Procedure naming and functionality
- Types of parameters and return values
- sort-lib - general sorting package
- Algorithm-specific sorting packages
Algorithmic properties
Topics to be resolved during discussion phase
Porting and optimisation
References & Links
Acknowledgements
Copyright

Procedure index

list-sorted?                    vector-sorted?

list-merge                      vector-merge
list-sort                       vector-sort
list-stable-sort                vector-stable-sort
list-delete-neighbor-dups       vector-delete-neighbor-dups

list-merge!                     vector-merge!
list-sort!                      vector-sort!
list-stable-sort!               vector-stable-sort!
list-delete-neighbor-dups!      vector-delete-neighbor-dups!

heap-sort   quick-sort   insert-sort   list-merge-sort   vector-merge-sort
heap-sort!  quick-sort!  insert-sort!  list-merge-sort!  vector-merge-sort!

Introduction

As I'll detail bewlow, I wasn't very happy with the state of the Scheme world for sorting and merging lists and vectors. So I have designed and written a fairly comprehensive sorting & merging toolkit. It is

very portable,
much better code than what is currently in Elk, Gambit, Bigloo, Scheme->C, MzScheme, RScheme, Scheme48, MIT Scheme, or slib, and
priced to move: free code.

The package includes

Vector insert sort (stable)
Vector heap sort
Vector quick sort (with median-of-3 pivot picking)
Vector merge sort (stable)
Pure and destructive list merge sort (stable)
Stable vector and list merge
Miscellaneous sort-related procedures: Vector and list merging, sorted? predicates, vector binary search, vector and list delete-equal-neighbor procedures.
A general, non-algorithmic set of procedure names for general sorting and merging.

Scheme programmers may want to adopt this package. I'd like Scheme implementors to adopt this code and its API -- in fact, the code is a bribe to make it easy for implementors to converge on the suggested API. I mean, you'd really have to be a boor to take this free code I wrote and mutate its interface over to your incompatible, unportable API, wouldn't you? But you could, of course -- it's freely available. More in the spirit of the offering, you could make this API available, and then also write a little module providing your old interface that is defined in terms of this API. "Scheme implementors," in this context, includes slib, which isn't really a standalone implementation of Scheme, but is an influential collection of API's and code.

The code is tightly bummed. It is clearly written, and commented in my usual voluminous style. This includes notes on porting and implementation-specific optimisations.

What's wrong with the current state of affairs?

It's just amazing to me that in 2002, sorting and merging hasn't been completely put to bed. These are well-understood algorithms, each of them well under a page of code. The straightforward algorithms are basic, core stuff -- sophomore-level. But if you tour the major Scheme implementations out there on the Net, you find badly written code that provides extremely spotty coverage of the algorithm space. One implementation even has a buggy implementation that has been in use for about 20 years!

Open source-code is a wonderful thing. In a couple of hours, I was able to download and check the sources of 9 Scheme systems. Here are my notes from the systems I checked. You can skip to the next section if you aren't morbidly curious.

slib

sorted? vector-or-list <
merge  list1 list2 <
merge! list1 list2 <
sort  vector-or-list <
sort! vector-or-list <

Richard O'Keefe's stable list merge sort is right idea, but implemented using gratuitous variable side effects. It also does redundant SET-CDR!s. The vector sort converts to list, merge sorts, then reconverts to vector. This is a bad idea -- non-local pointer chasing bad; vector shuffling good.

MIT Scheme

sort!       vector <
merge-sort! vector <
quick-sort! vector <

sort       vector-or-list <
merge-sort vector-or-list <
quick-sort vector-or-list <

Naive vector quicksort: loser, for worst-case performance reasons. List sort by "list->vector; quicksort; vector->list," hence also loser. A clever stable vector merge sort, albeit not very bummed.

Scheme 48 & T

sort-list  list <
sort-list! list <
list-merge! list1 list2 <

Bob Nix's implementation of online merge-sort, written in the early 80's. Conses unnecessary bookkeeping structure, which isn't necessary with a proper recursive formulation. Also, does redundant SET-CDR!s. No vector sort. Also, has a bug -- is claimed to be a stable sort, but isn't! To see this, get the S48 code, and try

(define (my< x y) (< (quotient x 2) (quotient y 2)))
(list-merge! (list 0 2) (list   3) my<)  ; -> (0 2 3)
(list-merge! (list   2) (list 0 3) my<)  ; -> (0 3 2)

This could be fixed very easily, but it isn't worth it given the other problems with the algorithm.

RScheme

vector-sort! vector <
sort collection <

Good basic implementation of vector heapsort, which has O(n lg n) worst-case time. Code ugly, needs tuning. List sort by "list->vector; sort; vector->list", which allocates unneeded temp storage. Nothing for stable sorting.

MzScheme

Naive quicksort -- but not available for vector sorting, even though it internally uses a vector. Nothing for stable sorting, and naive quicksort has bad worst-case behaviour.

Bigloo, Scheme->C

Couldn't find anything -- but maybe I didn't search for the right thing, since the Bigloo names are French. (I invite correction from the Bigloo implementors.)

Gambit

sort-list list <

Nothing for vectors. Simple, slow, unstable merge sort for lists.

Elk

Another naive quicksort. Lists handled by converting to vector.

sort  vector-or-list <
sort! vector-or-list <

Chez Scheme

merge  < list1 list2
merge! < list1 list2
sort  < list
sort! < list

These are stable. I have not seen the source code.

Common Lisp

sort        sequence < [key]
stable-sort sequence < [key]
merge result-type sequence1 sequence2 < [key]

The sort procedures are allowed, but not required, to be destructive.

SML/NJ

sort: ('a*'a -> bool) -> 'a list -> 'a list

"Smooth applicative merge sort," which is stable. There is also a highly bummed quicksort for vectors.

The right solution: Implement a full toolbox of carefully written standard sort routines.

Having the source available for all of these above-cited Schemes made life a lot easier writing this code. I appreciate the authors making their source available under such open terms.

Design rules

What vs. how

There are two different interfaces: "what" (simple) & "how" (detailed).

Simple: you specify semantics: datatype (list or vector), mutability, and stability.
Detailed: you specify the actual algorithm (quick, heap, insert, merge). Different algorithms have different properties, both semantic & pragmatic, so these exports are necessary.

It is necessarily the case that the specifications of these procedures make statements about execution "pragmatics." For example, the sole distinction between heap sort and quick sort -- both of which are provided by this library -- is one of execution time, which is not a "semantic" distinction. Similar resource-use statements are made about "iterative" procedures, meaning that they can execute on input of arbitrary size without needing to allocate an unbounded number of stack frames.

Consistency across function signatures

The two interfaces share common function signatures wherever possible, to facilitate switching a given call from one procedure to another.

Data parameter first, less-than parameter after

These procedures uniformly observe the following parameter order: the data to be sorted come before the the comparison function. That is, we write (sort lis <) not (sort < lis). This is consistent with every single implementation out there, with the sole exception of Chez Scheme.

In my opinion, it would be more consistent with other Scheme libraries to put the ordering function first -- the "operation currying" convention. (E.g., consider FOR-EACH or MAP or FIND.) I decided to leave things as they are in favor of near-total backwards compatibility with existing practice.

[Perhaps this should be discussed.]

Ordering, comparison functions & stability

These routines take a < comparison function, not a <= comparison function, and they sort into increasing order. The difference between a < spec and a <= spec comes up in two places:

the definition of an ordered or sorted data set, and
the definition of a stable sorting algorithm.

We say that a data set (a list or vector) is *sorted* or *ordered* if it contains no adjacent pair of values ... X Y ... such that Y < X.

In other words, scanning across the data never takes a "downwards" step.

If you use a <= procedure where these algorithms expect a < procedure, you may not get the answers you expect. For example, the LIST-SORTED? function will return false if you pass it a <= comparison function and an ordered list containing adjacent equal elements.
A "stable" sort is one that preserves the pre-existing order of equal elements. Suppose, for example, that we sort a list of numbers by comparing their absolute values, i.e., using comparison function (lambda (x y) (< (abs x) (abs y))) If we sort a list that contains both 3 and -3: ... 3 ... -3 ... then a stable sort is an algorithm that will not swap the order of these two elements, that is, the answer will look like ... 3 -3 ... not ... -3 3 ...

Choosing < for the comparison function instead of <= affects how stability is coded. Given an adjacent pair X Y, (< y x) means "Y should be moved in front of X" -- otherwise, leave things as they are. So using a <= function where a < function is expected will *invert* stability.

This is due to the definition of equality, given a < comparator:
```
(and (not (< x y)) (not (< y x)))
```
The definition is rather different, given a <= comparator:
```
(and (<= x y) (<= x y))
```
A "stable" merge is one that reliably favors one of its data sets when equal items appear in both data sets. *All merge operations in this library are stable*, breaking ties between data sets in favor of the first data set -- elements of the first list come before equal elements in the second list.

So, if we are merging two lists of numbers ordered by absolute value using the stable merge operation LIST-MERGE
```
(list-merge '(0 -2 4 8 -10) '(-1 3 -4 7)
            (lambda (x y) (< (abs x) (abs y))))
```
reliably places the 4 of the first list before the equal-comparing -4 of the second list:
```
(0 -1 -2 4 -4 7 8 -10)
```

In short, if your comparison function F answers true to (F x x), then using a stable sorting or merging algorithm will not give you a stable sort or merge, and LIST-SORTED? may surprise you. Note that you can synthesize a < function from a <= function with (lambda (x y) (not (<= y x))) if need be.

Precise definitions give sharp edges to tools, but require care in use. "Measure twice, cut once."

I have adopted the choice of < from Common Lisp. I assume they had a good reason for adopting < instead of <=. I'd love to know what this reason is; send me email if you can explain it, please.

All vector operations accept optional subrange parameters

The vector operations specified below all take optional START/END arguments indicating a selected subrange of a vector's elements. If a START parameter or START/END parameter pair is given to such a procedure, they must be exact, non-negative integers, such that 0 <= START <= END <= (VECTOR-LENGTH V) where V is the related vector parameter. If not specified, they default to 0 and the length of the vector, respectively. They are interpreted to select the range [START,END), that is, all elements from index START (inclusive) up to, but not including, index END.

Required vs. allowed side-effects

LIST-SORT! and LIST-STABLE-SORT! are allowed, but not required, to alter their arguments' cons cells to construct the result list. This is consistent with the what-not-how character of the group of procedures to which they belong (the "sort-lib" package).

The LIST-DELETE-NEIGHBOR-DUPS!, LIST-MERGE! and LIST-MERGE-SORT! procedures, on the other hand, provide specific algorithms, and, as such, explicitly commit to the use of side-effects on their input lists in order to guarantee their key algorithmic properties (e.g., linear-time operation, constant-space stack use).

Procedure specification

The procedures are split into several packages. In a Scheme system that has a module or package system, these procedures should be contained in modules named as follows:

Package name	Functionality
sort-lib	General sorting for lists & vectors
sorted?-lib	Sorted predicates for lists & vectors
list-merge-sort-lib	List merge sort
vector-merge-sort-lib	Vector merge sort
vector-heap-sort-lib	Vector heap sort
vector-quick-sort-lib	Vector quick sort
vector-insert-sort-lib	Vector insertion sort
delndup-lib	List and vector delete neighbor duplicates

A Scheme system without a module system should provide all of the bindings defined in all of these modules as components of the "SRFI-32" package.

Note that there is no list insert sort package, as you might as well always use list merge sort. The reference implementation's destructive list merge sort will do fewer SET-CDR!s than a destructive insert sort.

Procedure naming and functionality

Almost all of the procedures described below are variants of two basic operations: sorting and merging. These procedures are consistently named by composing a set of basic lexemes to indicate what they do.

Lexeme	Meaning
"sort"	The procedure sorts its input data set by some < comparison function.
"merge"	The procedure merges two ordered data sets into a single ordered result.
"stable"	This lexeme indicates that the sort is a stable one.
"vector"	The procedure operates upon vectors.
"list"	The procedure operates upon lists.
"!"	Procedures that end in "!" are allowed, and sometimes required, to reuse their input storage to construct their answer.

Types of parameters and return values

In the procedures specified below,

A LIS parameter is a list;
A V parameter is a vector;
A < or = parameter is a procedure accepting two arguments taken from the specified procedure's data set(s), and returning a boolean;
START and END parameters are exact, non-negative integers that serve as vector indices selecting a subrange of some associated vector. When specified, they must satisfy the relation 0 <= start <= end <= (vector-length v) where V is the associated vector.

Passing values to procedures with these parameters that do not satisfy these types is an error.

If a procedure is said to return "unspecified," this means that nothing at all is said about what the procedure returns, not even the number of return values. Such a procedure is not even required to be consistent from call to call in the nature or number of its return values. It is simply required to return a value (or values) that may be passed to a command continuation, e.g. as the value of an expression appearing as a non-terminal subform of a BEGIN expression. Note that in R5RS, this restricts such a procedure to returning a single value; non-R5RS systems may not even provide this restriction.

sort-lib - general sorting package

This library provides basic sorting and merging functionality suitable for general programming. The procedures are named by their semantic properties, i.e., what they do to the data (sort, stable sort, merge, and so forth).

Procedure	Suggested algorithm
list-sorted? lis < -> boolean
list-merge lis1 lis2 < -> list
list-merge! lis1 lis2 < -> list
list-sort lis < -> list	(vector heap or quick)
list-sort! lis < -> list	(list merge sort)
list-stable-sort lis < -> list	(vector merge sort)
list-stable-sort! lis < -> list	(list merge sort)
list-delete-neighbor-dups lis = -> list
list-delete-neighbor-dups! lis = -> list
vector-sorted? v < [start end] -> boolean
vector-merge v1 v2 < [start1 end1 start2 end2] -> vector
vector-merge! v v1 v2 < [start start1 end1 start2 end2] -> unspecific
vector-sort v < [start end] -> vector	(heap or quick sort)
vector-sort! v < [start end] -> unspecific	(heap or quick sort)
vector-stable-sort v < [start end] -> vector	(vector merge sort)
vector-stable-sort! v < [start end] -> unspecific	(vector merge sort)
vector-delete-neighbor-dups v = [start end] -> vector
vector-delete-neighbor-dups! v = [start end] -> end'

LIST-SORTED? and VECTOR-SORTED? return true if their input list or vector is in sorted order, as determined by their < comparison parameter.

All four merge operations are stable: an element of the initial list LIS1 or vector V1 will come before an equal-comparing element in the second list LIS2 or vector V2 in the result.

The procedures

LIST-MERGE
LIST-SORT
LIST-STABLE-SORT
LIST-DELETE-NEIGHBOR-DUPS

do not alter their inputs and are allowed to return a value that shares a common tail with a list argument.

The procedures

LIST-SORT!
LIST-STABLE-SORT!

are "linear update" operators -- they are allowed, but not required, to alter the cons cells of their arguments to produce their results.

On the other hand, the procedures

LIST-DELETE-NEIGHBOR-DUPS!
LIST-MERGE!

make only a single, iterative, linear-time pass over their argument lists, using SET-CDR!s to rearrange the cells of the lists into the final result -- they work "in place." Hence, any cons cell appearing in the result must have originally appeared in an input. The intent of this iterative-algorithm commitment is to allow the programmer to be sure that if, for example, LIST-MERGE! is asked to merge two ten-million-element lists, the operation will complete without performing some extremely (possibly twenty-million) deep recursion.

The vector procedures

VECTOR-SORT
VECTOR-STABLE-SORT
VECTOR-DELETE-NEIGHBOR-DUPS

do not alter their inputs, but allocate a fresh vector for their result, of length END-START.

The vector procedures

VECTOR-SORT!
VECTOR-STABLE-SORT!

sort their data in-place. (But note that VECTOR-STABLE-SORT! may allocate temporary storage proportional to the size of the input -- I am not aware of O(n lg n) stable vector sorting algorithms that run in constant space.)

VECTOR-MERGE returns a vector of length (END1-START1)+(END2-START2).

VECTOR-MERGE! writes its result into vector V, beginning at index START0, for indices less than END0 = START0 + (END1-START1) + (END2-START2). The target subvector V[start0,end0) may not overlap either source subvector V1[start1,end1) V2[start2,end2).

The DELETE-NEIGHBOR-DUP-... procedures: These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure. The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.

These procedures are linear time -- much faster than the O(n^2) general duplicate-element deletors that do not assume any "bunching" of elements (such as the ones provided by SRFI-1). If you want to delete duplicate elements from a large list or vector, sort the elements to bring equal items together, then use one of these procedures, for a total time of O(n lg n).

The comparison function = passed to these procedures is always applied (= x y) where X comes before Y in the containing list or vector.

LIST-DELETE-NEIGHBOR-DUPS does not alter its input list; its answer may share storage with the input list.
VECTOR-DELETE-NEIGHBOR-DUPS does not alter its input vector, but rather allocates a fresh vector to hold the result.
LIST-DELETE-NEIGHBOR-DUPS! is permitted, but not required, to mutate its input list in order to construct its answer.
VECTOR-DELETE-NEIGHBOR-DUPS! reuses its input vector to hold the answer, packing its answer into the index range [start,end'), where END' is the non-negative exact integer returned as its value. It returns END' as its result. The vector is not altered outside the range [start,end').

[Maybe this procedure should take a "target" vector to write?]

Examples:

(list-delete-neighbor-dups '(1 1 2 7 7 7 0 -2 -2) =)
  => (1 2 7 0 -2)

(vector-delete-neighbor-dups '#(1 1 2 7 7 7 0 -2 -2) =)
  => #(1 2 7 0 -2)

(vector-delete-neighbor-dups '#(1 1 2 7 7 7 0 -2 -2) = 3 7)
  => #(7 0 -2)

;; Result left in v[3,9):
(let ((v (vector 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6)))
  (cons (vector-delete-neighbor-dups! v = 3)
        v))
   => (9 . #(0 0 0 1 2 3 4 5 6 4 4 5 5 6 6))

Algorithm-specific sorting packages

These packages provide more specific sorting functionality, that is, specific committment to particular algorithms that have particular pragmatic consequences (such as memory locality, asymptotic running time) beyond their semantic behaviour (sorting, stable sorting, merging, etc.). Programmers that need a particular algorithm can use one of these packages.

sorted?-lib - sorted predicates

list-sorted? lis <             -> boolean
vector-sorted? v < [start end] -> boolean

Return #f iff there is an adjacent pair ... X Y ... in the input list or vector such that Y < X. The optional START/END range arguments restrict VECTOR-SORTED? to the indicated subvector.

list-merge-sort-lib - list merge sort

list-merge-sort  lis <      -> list
list-merge-sort! lis <      -> list
list-merge  lis1 lis2 <     -> list
list-merge! lis1 lis2 <     -> list

The sort procedures sort their data using a list merge sort, which is stable. (The reference implementation is, additionally, a "natural" sort. See below for the properties of this algorithm.)

The ! procedures are destructive -- they use SET-CDR!s to rearrange the cells of the lists into the proper order. As such, they do not allocate any extra cons cells -- they are "in place" sorts. Additionally, LIST-MERGE! is iterative, not recursive -- it can operate on arguments of arbitrary size without requiring an unbounded amount of stack space.

The merge operations are stable: an element of LIS1 will come before an equal-comparing element in LIS2 in the result list.

vector-merge-sort-lib - vector merge sort

vector-merge-sort  v < [start end temp]                     -> vector
vector-merge-sort! v < [start end temp]                     -> unspecific
vector-merge  v1 v2 < [start1 end1 start2 end2]             -> vector
vector-merge! v v1 v2 < [start0 start1 end1 start2 end2]    -> unspecific

The sort procedures sort their data using vector merge sort, which is stable. (The reference implementation is, additionally, a "natural" sort. See below for the properties of this algorithm.)

The optional START/END arguments provide for sorting of subranges, and default to 0 and the length of the corresponding vector.

Merge-sorting a vector requires the allocation of a temporary "scratch" work vector for the duration of the sort. This scratch vector can be passed in by the client as the optional TEMP argument; if so, the supplied vector must be of size >= END, and will not be altered outside the range [start,end). If not supplied, the sort routines allocate one themselves.

The merge operations are stable: an element of V1 will come before an equal-comparing element in V2 in the result vector.

VECTOR-MERGE-SORT! leaves its result in V[start,end).

VECTOR-MERGE-SORT returns a vector of length END-START.

VECTOR-MERGE returns a vector of length (END1-START1)+(END2-START2).

VECTOR-MERGE! writes its result into vector V, beginning at index START0, for indices less than END0 = START0 + (END1-START1) + (END2-START2). The target subvector V[start0,end0) may not overlap either source subvector V1[start1,end1) V2[start2,end2).

vector-heap-sort-lib - vector heap sort

heap-sort  v < [start end] -> vector
heap-sort! v < [start end] -> unspecific

These procedures sort their data using heap sort, which is not a stable sorting algorithm.

HEAP-SORT returns a vector of length END-START. HEAP-SORT! is in-place, leaving its result in V[start,end).

vector-quick-sort-lib - vector quick sort

quick-sort  v < [start end] -> vector
quick-sort! v < [start end] -> unspecific

These procedures sort their data using quick sort, which is not a stable sorting algorithm.

QUICK-SORT returns a vector of length END-START. QUICK-SORT! is in-place, leaving its result in V[start,end).

vector-insert-sort-lib - vector insertion sort

insert-sort  v < [start end] -> vector
insert-sort! v < [start end] -> unspecific

These procedures stably sort their data using insertion sort.

INSERT-SORT returns a vector of length END-START. INSERT-SORT! is in-place, leaving its result in V[start,end).

delndup-lib - list and vector delete neighbor duplicates

list-delete-neighbor-dups  lis = -> list
list-delete-neighbor-dups! lis = -> list

vector-delete-neighbor-dups  v = [start end] -> vector
vector-delete-neighbor-dups! v = [start end] -> end'

These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure =. The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.

The comparison function = passed to these procedures is always applied (= x y) where X comes before Y in the containing list or vector.

LIST-DELETE-NEIGHBOR-DUPS does not alter its input list; its answer may share storage with the input list.

VECTOR-DELETE-NEIGHBOR-DUPS does not alter its input vector, but rather allocates a fresh vector to hold the result.

LIST-DELETE-NEIGHBOR-DUPS! is permitted, but not required, to mutate its input list in order to construct its answer.

VECTOR-DELETE-NEIGHBOR-DUPS! reuses its input vector to hold the answer, packing its answer into the index range [start,end'), where END' is the non-negative exact integer returned as its value. It returns END' as its result. The vector is not altered outside the range [start,end').

Examples:

(list-delete-neighbor-dups '(1 1 2 7 7 7 0 -2 -2) =)
  => (1 2 7 0 -2)

(vector-delete-neighbor-dups '#(1 1 2 7 7 7 0 -2 -2) =)
  => #(1 2 7 0 -2)

(vector-delete-neighbor-dups '#(1 1 2 7 7 7 0 -2 -2) = 3 7)
  => #(7 0 -2)

;; Result left in v[3,9):
(let ((v (vector 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6)))
  (cons (vector-delete-neighbor-dups! v = 3)
        v))
   => (9 . #(0 0 0 1 2 3 4 5 6 4 4 5 5 6 6))

Algorithmic properties

Different sort and merge algorithms have different properties. Choose the algorithm that matches your needs: < /p>

Vector insert sort

Stable, but only suitable for small vectors -- O(n^2).

Vector quick sort

Not stable. Is fast on average -- O(n lg n) -- but has bad worst-case behaviour. Has good memory locality for big vectors (unlike heap sort). A clever pivot-picking trick (median of three samples) helps avoid worst-case behaviour, but pathological cases can still blow up.

Vector heap sort

Not stable. Guaranteed fast -- O(n lg n) *worst* case. Poor locality on large vectors. A very reliable workhorse.

Vector merge sort

Stable. Not in-place -- requires a temporary buffer of equal size. Fast -- O(n lg n) -- and has good memory locality for large vectors.

The implementation of vector merge sort provided by this SRFI's reference implementation is, additionally, a "natural" sort, meaning that it exploits existing order in the input data, providing O(n) best case.

Destructive list merge sort

Stable, fast and in-place (i.e., allocates no new cons cells). "Fast" means O(n lg n) worse-case, and substantially better if the data is already mostly ordered, all the way down to linear time for a completely-ordered input list (i.e., it is a "natural" sort).

Note that sorting lists involves chasing pointers through memory, which can be a loser on modern machine architectures because of poor cache & page locality. Pointer *writing*, which is what the SET-CDR!s of a destructive list-sort algorithm do, is even worse, especially if your Scheme has a generational GC -- the writes will thrash the write-barrier. Sorting vectors has inherently better locality.

This SRFIs destructive list merge and merge sort implementations are opportunistic -- they avoid redundant SET-CDR!s, and try to take long already-ordered runs of list structure as-is when doing the merges.

Pure list merge sort

Stable and fast -- O(n lg n) worst-case, and possibly better, depending upon the input list (see above).

Algorithm	Stable?	Worst case	Average case	In-place
V insert	Yes	O(n^2)	O(n^2)	Yes
V quick	No	O(n^2)	O(n lg n)	Yes
V heap	No	O(n lg n)	O(n lg n)	Yes
V merge	Yes	O(n lg n)	O(n lg n)	No
L merge	Yes	O(n lg n)	O(n lg n)	Either

Topics to be resolved during discussion phase

I particularly solicit comments about the following topics.

Include VECTOR-BINARY-SEARCH ?

Should we include (VECTOR-BINARY-SEARCH v key< elt->key key [start end]) in the SRFI? It sort of goes with sorting; it's exactly ten lines of code.
Comparison function before or after the list/vector argument? Should it be (list-sort < lis) or (list-sort lis <)

There is overwhelming consistency among the implementations: data first, < after. Only Chez does it differently.

I have done it in the backwards-compatible way. But I prefer the < first, data after way.

Porting and optimisation

This package should be trivial to port. There are only four non-R4RS bits in the code:

Use of multiple-value return, with the R5RS VALUES procedure, and the simple (RECEIVE (var ...) mv-exp body ...) multiple-value binding macro.
A VECTOR-COPY procedure. This is a tiny little procedure: (vector-copy v [start end])
Use of the LET-OPTIONALS macro from scsh to parse and default optional arguments to three routines. Again, easy to port the macro or rewrite the code to parse, default, and error check the args by hand.
Calls to an ERROR function for complaining about bad arguments.

This code is tightly bummed, as far as I can go in portable Scheme.

You could speed up the vector code a lot by error-checking the procedure parameters and then shifting over to fixnum-specific arithmetic and dangerous vector-indexing and vector-setting primitives. The comments in the code indicate where the initial error checks would have to be added. There are several (QUOTIENT N 2)'s that could be changed to a fixnum right-shift, as well, in both the list and vector code. The code is designed to enable this -- each file usually exports one or two "safe" procedures that end up calling an internal "dangerous" primitive. The little exported cover procedures are where you move the error checks.

This should provide *big* speedups. In fact, all the code bumming I've done pretty much disappears in the noise unless you have a good compiler and also can dump the vector-index checks and generic arithmetic -- so I've really just set things up for you to exploit.

The optional-arg parsing, defaulting, and error checking is done with a portable R4RS macro. But if your Scheme has a faster mechanism (e.g., Chez), you should definitely port over to it. Note that argument defaulting and error-checking are interleaved -- you don't have to error-check defaulted START/END args to see if they are fixnums that are legal vector indices for the corresponding vector, etc.

References & Links

This document, in HTML:
    https://srfi.schemers.org/srfi-32/srfi-32.html
    [This link may not be valid while the SRFI is in draft form.]

This document, in simple text format:
    https://srfi.schemers.org/srfi-32/srfi-32.txt

Archive of SRFI-32 discussion-list email:
    https://srfi-email.schemers.org/srfi-32/

SRFI web site:
    https://srfi.schemers.org/

[CommonLisp]
    Common Lisp: the Language
    Guy L. Steele Jr. (editor).
    Digital Press, Maynard, Mass., second edition 1990.
    Available at http://www.elwood.com/alu/table/references.htm#cltl2

    The Common Lisp "HyperSpec," produced by Kent Pitman, is essentially
    the ANSI spec for Common Lisp:
    http://www.lispworks.com/documentation/HyperSpec/Front/index.htm

[R5RS]
    Revised^5 Report on the Algorithmic Language Scheme,
    R. Kelsey, W. Clinger, J. Rees (editors).
    Higher-Order and Symbolic Computation, Vol. 11, No. 1, September, 1998.
    and ACM SIGPLAN Notices, Vol. 33, No. 9, October, 1998.

    Available at http://www.schemers.org/Documents/Standards/

Acknowledgements

I thank the authors of the open source I consulted when designing this library, particularly Richard O'Keefe, Donovan Kolby and the MIT Scheme Team.

Copyright

SRFI text

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Reference implementation

[Available at github.com/scheme-requests-for-implementation/srfi-32]

Short summary: no restrictions.

While I wrote all of this code myself, I read a lot of code before I began writing. However, all such code is, itself, either open source or public domain, rendering irrelevant any issue of "copyright taint."

The natural merge sorts (pure list, destructive list, and vector) are not only my own code, but are implementations of an algorithm of my own devising. They run in O(n lg n) worst case, O(n) best case, and require only a logarithmic number of stack frames. And they are stable. And the destructive-list variant allocates zero cons cells; it simply rearranges the cells of the input list.

Hence the reference implementation is
Copyright (c) 1998 by Olin Shivers.
and made available under the same copyright as the SRFI text (see above).

Editor: Francisco Solsona