Purely Functional Random-Access Pairs and Lists
David Van Horn
This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-101@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Random-access lists [1] are a purely functional
data structure for representing lists of values. A random-access list
may act as a drop in replacement for the usual linear-access pair and
list data structures
(pair?
, cons
, car
, cdr
),
which additionally supports fast index-based addressing and updating
(list-ref
, list-set
). The impact is a whole
class of purely-functional algorithms expressed in terms of
index-based list addressing become feasible compared with their
linear-access list counterparts.
This document proposes a library API for purely functional random-access lists consistent with the R^{6}RS [2] base library and list utility standard library [3].
list-ref
and list-tail
the choice seems poor since they
include the prefix list-
even though they do not
operate on lists, but chains of pairs, i.e. lists and improper
lists, and arbitrary objects, respectively. Although the names
have remained the same, the descriptions have been corrected
(e.g. using pair or obj instead of list
for parameter names). Should the names be changed as well?
(rnrs base)
library that
deals with lists be included? By my count this would mean adding:
lambda
, apply
,
vector->list
, list->vector
,
string->list
, and list->string
. I am
inclined to add these. Should all of the (rnrs
lists)
library be included? These procedures are easily
defined in terms of what's given here, and no perfomance advantage
is gained by implementig them "under the hood" using the data
structures in the reference implementation. I am
inclined not to include them.
car+cdr
procedure be added?
syntax
and procedures
sub-libraries be included?
Functional programming and list hacking go together like peanut butter
and jelly, eval and apply, syntax and semantics, or cursing and
recursing. But the traditional approach to implementing pairs and
lists results in index-based access (list-ref
) requiring
time proportional the index being accessed. Moreover, indexed-based
functional update (list-set
) becomes so inefficient as to
be nearly unspeakable. Instead, programmers revert the imperatives of
the state; they use a stateful data structure and imperative
algorithms.
This SRFI intends to improve the situation by offering an alternative
implementation strategy based on Okasaki's purely functional
random-access lists [1]. Random-access pairs
and lists can be used as a replacement for traditional, linear-access
pairs and lists with no asymptotic loss of efficiency. In other
words, the typical list and pair operations such
as cons
, car
, and cdr
, all
operate in O(1) time as usual. However, random-access lists
additionally support index-based access and functional update
operations that are asymptotically cheaper; O(log(n)) for
random-access lists versus O(n) for linear-access lists,
where n is the length of the list being access or updated.
As such, many purely functional index-based list algorithms become
feasible by using a random-access list representation for pairs and
lists.
The requirements of this SRFI have been designed in such a way as to admit portable library implementations of this feature, such as the reference implementation, while at the same time admit more radical implementations that embrace random-access pairs as the fundamental pair representation.
A random-access pair (or just pair) is a compound
structure with two fields called the car and the cdr fields
(consistent with the historical naming of pair fields in Scheme).
Pairs are created by the procedure cons
. The car and cdr
fields are accessed by the procedures car
and cdr
.
Pairs are used primarily to represents lists. A list can be defined recursively as either the empty list or a pair whose cdr is a list. More precisely, the set of lists is defined as the smallest set X such that
The objects in the car fields of successive pairs of a list are the elements of the list. For example, a two-element list is a pair whose car is the first element and whose cdr is a pair whose car is the second element and whose cdr is the empty list. The length of a list is the number of elements, which is the same as the number of pairs.
The empty list is a special object of its own type. It is not a pair. It has no elements and its length is zero.
Note: The above definitions imply that all lists have finite length and are terminated by the empty list.
A chain of pairs is defined recursively as either a non-pair object or a pair whose cdr is a chain of pairs (Note: every value is a chain of pairs). A chain of pairs ending in the empty list is a list. A chain of pairs not ending in the empty list is called an improper list. Note that an improper list is not a list. Whether a given pair is a list depends upon what is stored in the cdr field.
The external representation of pairs is not specified by this SRFI, however the examples below do use the typical notation for writing pair and list values.
Random-access pairs and lists are specified to be fully functional,
or, to use the term from the academic literature, fully
persistent [1]. Full persistence means
that all operations on random-access lists, notably
including cons
, list-ref
, list-set
,
and list-ref/update
, are specified
It is usually taken for granted that standard Scheme lists have these properties. This SRFI explicitly specifies that random-access lists share them.
syntax: (quote datum)
Syntax: <Datum> should be a syntactic datum.
Semantics: (quote <datum>)
evaluates to
the datum value represented by <datum> (see
section 4.3
of R6RS). This notation is used to include constants.
When the datum value represented by <datum> contains pair
structure, quote
produces random-access pairs.
(quote a) ⇒ a (quote #(a b c)) ⇒ #(a b c) (quote (+ 1 2)) ⇒ (+ 1 2)
As noted in
section 4.3.5
of R6RS, (quote <datum>)
may be abbreviated
as '<datum>
:
'"abc" ⇒ "abc" '145932 ⇒ 145932 'a ⇒ a '#(a b c) ⇒ #(a b c) '() ⇒ () '(+ 1 2) ⇒ (+ 1 2) '(quote a) ⇒ (quote a) ''a ⇒ (quote a)
As noted in section 5.10 of R6RS, constants are immutable.
Note: Different constants that are the value of
quote
expression may share the same locations.
procedure: (equal? obj_{1} obj_{2})
→ bool
The equal?
predicate returns #t
if and only
if the (possibly infinite) unfoldings of its arguments into regular
trees are equal as ordered trees.
The equal?
predicate treats pairs and vectors as nodes
with outgoing edges, uses string=?
to compare strings,
uses bytevector=?
to compare bytevectors, and
uses eqv?
to compare other nodes.
(equal? 'a 'a) ⇒ #t (equal? '(a) '(a)) ⇒ #t (equal? '(a (b) c) '(a (b) c)) ⇒ #t (equal? "abc" "abc") ⇒ #t (equal? 2 2) ⇒ #t (equal? (make-vector 5 'a) (make-vector 5 'a)) ⇒ #t (equal? '#vu8(1 2 3 4 5) (u8-list->bytevector '(1 2 3 4 5)) ⇒ #t (equal? (lambda (x) x) (lambda (y) y)) ⇒ unspecified (let* ((x (list 'a)) (y (list 'a)) (z (list x y))) (list (equal? z (list y x)) (equal? z (list x x)))) ⇒ (#t #t)
procedure: (pair? obj) → bool
Returns #t
if obj is a pair, and otherwise
returns #f
. This operation must take O(1) time.
(pair? '(a . b)) ⇒ #t (pair? '(a b c)) ⇒ #t (pair? '()) ⇒ #f (pair? '#(a b)) ⇒ #f
procedure: (cons obj_{1} obj_{2})
→ pair
Returns a newly allocated pair whose car is obj_{1}
and whose cdr is obj_{2}. The pair is guaranteed to
be different (in the sense of eqv?
) from every existing
object. This operation must take O(1) time.
(cons 'a '()) ⇒ (a) (cons '(a) '(b c d)) ⇒ ((a) b c d) (cons "a" '(b c)) ⇒ ("a" b c) (cons 'a 3) ⇒ (a . 3) (cons '(a b) 'c) ⇒ ((a b) . c)
procedure: (car pair) → obj
Returns the contents of the car field of pair. This operation must take O(1) time.
(car '(a b c)) ⇒ a (car '((a) b c d)) ⇒ (a) (car '(1 . 2)) ⇒ 1 (car '()) &assertion exception
procedure: (cdr pair) → obj
Returns the contents of the cdr field of pair. This operation must take O(1) time.
(cdr '((a) b c d)) ⇒ (b c d) (cdr '(1 . 2)) ⇒ 2 (cdr '()) &assertion exception
procedure: (caar pair) → obj
procedure: (cadr pair) → obj
...
procedure: (cdddar pair) → obj
procedure: (cddddr pair) → obj
These procedures are compositions of car
and cdr
, where for example caddr
could be
defined by
(define caddr (lambda (x) (car (cdr (cdr x)))))
.
Arbitrary compositions, up to four deep, are provided. There are twenty-eight of these procedures in all. These operations must take O(1) time.
procedure: (null? obj) → bool
Returns #t
if obj is the empty
list, #f
otherwise. This procedure is equivalent
to the null?
procedure of the R6RS base library.
procedure: (list? obj) → bool
Returns #t
if obj is a list, #f
otherwise. By definition, all lists are chains of pairs that have
finite length and are terminated by the empty list. This operation
must take time bounded by O(log(n)), where n is the
number of pairs in the chain forming the potential list.
(list? '(a b c)) ⇒ #t (list? '()) ⇒ #t (list? '(a . b)) ⇒ #f
procedure: (list obj ...) → list
Returns a newly allocated list of its arguments. This operation must
take time bounded by O(n), where n is the number of
arguments to list
.
(list 'a (+ 3 4) 'c) ⇒ (a 7 c) (list) ⇒ ()
procedure: (make-list k)
→ list
procedure: (make-list k obj)
→ list
Returns a newly allocated list of k elements. If a second argument is given, then each element is initialized to obj. Otherwise the initial contents of each element is unspecified. This operation must take time and space bounded by O(log(k)).
(make-list 5 0) ⇒ (0 0 0 0 0)
procedure: (length list) → k
Returns the length of list. This operation must take time bounded by O(log(n)), where n is the length of the list.
(length '(a b c)) ⇒ 3 (length '(a (b) (c))) ⇒ 3 (length '()) ⇒ 0
procedure: (length<=? obj k)
→ bool
Returns true if obj is a chain of at least k pairs and false otherwise. This operation must take time bounded by O(log(min(k,n))), where n is the length of the chain of pairs.
(length<=? 'not-a-list 0) ⇒ #t (length<=? '(a . b) 0) ⇒ #t (length<=? '(a . b) 1) ⇒ #t (length<=? '(a . b) 2) ⇒ #f
procedure: (append list ... obj)
→ obj
Returns a chain of pairs consisting of the elements of the first list followed by the elements of the other lists, with obj as the cdr of the final pair. An improper list results if obj is not a list. This operation must take time bounded by O(log(n)), where n is the total number of elements in the given lists.
(append '(x) '(y)) ⇒ (x y) (append '(a) '(b c d)) ⇒ (a b c d) (append '(a (b)) '((c))) ⇒ (a (b) (c)) (append '(a b) '(c . d)) ⇒ (a b c . d) (append '() 'a) ⇒ a
procedure: (reverse list) → list
Returns a newly allocated list consisting of the element of list in reverse order. This operation must take time bounded by O(n) where n is the length of the list.
(reverse '(a b c)) ⇒ (c b a) (reverse '(a (b c) 'd '(e (f)))) ⇒ ((e (f)) d (b c) a)
procedure: (list-tail obj k)
→ obj
Obj should be a chain of pairs with a count of at
least k. The list-tail
procedure returns the
object obtained by omitting the first k elements
in obj. This operation must take time bounded
by O(log(min(k,n))), where n is the length of the
chain of pairs.
(list-tail '(a b c d) 0) ⇒ (a b c d) (list-tail '(a b c d) 2) ⇒ (c d) (list-tail 'not-a-list 0) ⇒ not-a-list
Implementation responsibilities: The implementation must check that obj is a chain of pairs whose count is at least k.
procedure: (list-ref pair k)
→ obj
Pair must be a chain of pairs whose count is at least k +
1. The list-ref
procedure returns the kth
element of pair. This operation must take time bounded
by O(min(k,log(n))), where n is the length of the
chain of pairs.
(list-ref '(a b c d) 2) ⇒ c
Implementation responsibilities: The implementation must check that pair is a chain of pairs whose count is at least k + 1.
procedure: (list-set pair k obj)
→ obj
Pair must be a chain of pairs whose count is at least k +
1. The list-set
procedure returns the chain of
pairs obtained by replacing the kth element
with obj. This operation must take time bounded
by O(min(k,log(n))), where n is the length of the
chain of pairs.
(list-set '(a b c d) 2 'x) ⇒ (a b x d)
Implementation responsibilities: The implementation must check that pair is a chain of pairs whose count is at least k + 1.
procedure: (list-ref/update pair k proc)
→ obj_{1} obj_{2}
Returns the same results as:
(values (list-ref pair k) (list-set pair k (proc (list-ref pair k))))
but it may be implemented more efficiently.
(list-ref/update '(7 8 9 10) 2 -) ⇒ 9 (7 8 -9 10)
procedure: (map proc list_{1} list_{2}
...) → list
The lists should all have the same length. Proc should accept as many arguments as there are lists and return a single value.
The map
procedure applies proc element-wise to
the elements of the lists and returns a list of the results,
in order. Proc is always called in the same dynamic
environment as map
itself. The order in
which proc is applied to the elements of the lists
is unspecified.
(map cadr '((a b) (d e) (g h))) ⇒ (b e h) (map (lambda (n) (expt n n)) '(1 2 3 4 5)) ⇒ (1 4 27 256 3125) (map + '(1 2 3) (4 5 6)) ⇒ (5 7 9) (let ((count 0)) (map (lambda (ignored) (set! count (+ count 1)) count) '(a b))) ⇒ (1 2) or (2 1)
Implementation responsibilities: The implementation should check that the lists all have the same length. The implementation must check the restrictions on proc to the extent performed by applying it as described. An implementation may check whether proc is an appropriate argument before applying it.
procedure: (for-each proc list_{1} list_{2}
...) → unspecified
The lists should all have the same length. Proc should accept as many arguments as there are lists.
The for-each
procedure applies proc element-wise
to the elements of the lists for its side effects, in order
from the first element to the last. Proc is always called in
the same dynamic environment as for-each
itself. The
return values of for-each
are unspecified.
(let ((v (make-vector 5))) (for-each (lambda (i) (vector-set! v i (* i i))) '(0 1 2 3 4)) v) ⇒ #(0 1 4 9 16) (for-each (lambda (x) x) '(1 2 3 4)) ⇒ unspecified (for-each even? '()) ⇒ unspecified
Implementation responsibilities: The implementation should check that the lists all have the same length. The implementation must check the restrictions on proc to the extent performed by applying it as described. An implementation may check whether proc is an appropriate argument before applying it.
Note: Implementations of
for-each
may or may not tail-call proc on the last element.
procedure: (random-access-list->linear-access-list ra-list)
→ la-list
procedure: (linear-access-list->random-access-list la-list)
→ ra-list
These procedures convert between (potentially) distinct
representations of lists. To avoid confusion, parameters
named ra-list range over lists represented with random-access
lists, i.e. objects satisfying the list?
predicate
described above, while parameters named la-list range over
lists represented with the more traditional linear-access lists,
i.e. objects satisfying the list?
predicate of R6RS. In
systems that represent all lists as random-access lists, these
conversions may simply be list identity procedures.
Random-access pairs must be disjoint from all other base types with the possible exception of (linear-access) pairs.
The external representation of random-access pairs is unspecified.
The behavior of equal?
when given a random-access pair
and a sequential-access pair is unspecified in implementations with
disjoint representations.
The behavior of eq?
and eqv?
on
random-access pairs must be the same as that for pairs, vectors, or
records. Namely, two random-access pair objects are eq?
if and only if they are eqv?
, and they
are eqv?
if and only if they refer to the same location
in the store.
All argument checking for each operation must be done within the time bounds given for that operation.
Implementations are encouraged, but not required, to support random-access pairs and lists as their primary pair and list representation. In such an implementation, the external representation of random-access pairs and list should be as described in section 4.3.2 (Pairs and lists) of R^{6}RS, the behavior of equivalence predicates on random-access pairs should be as described in section 11.5 (Equivalence predicates) of R^{6}RS, and so on. In short, all pairs should be random-access pairs.
Implementations supporting SRFI Libraries [4] and SRFI 101 must provide the following libraries:
(srfi :101) ; Composite libraries (srfi :101 random-access-lists) (srfi :101 random-access-lists procedures) ; Procedures only (srfi :101 random-access-lists syntax) ; Syntax only
A portable R6RS library reference implementation and test suite are provided. The library has been tested successfully using Ikarus (0.0.3), Larceny (0.97), and PLT Scheme (4.2.1.7).
A PLT Scheme specific library that implements purely functional random-access lists, but with an API designed to be consistent with PLT's list libraries rather than R6RS, is available as the RaList package on Planet [5].
I am grateful to the members of the
Northeastern University
Programming Research Laboratory
and PLT (and their intersection)
for discussions during the pre-draft development of this SRFI and the
library that preceded it. We are all indebted to Okasaki for his
elegant solution to this problem. Much of the specification is
adapted from text intended to belong to the Scheme community; I thank
the editors and authors of the RnRS series collectively for their
efforts. I am grateful to Donovan Kolbly for serving as SRFI editor
and to Taylor R Campbell, Robert Bruce Findler, Aubrey Jaffer, Shiro
Kawai, and Alexey Radul for discussion during the draft period. I
thank William D Clinger, Robert Bruce Findler, and Abdulaziz Ghuloum
for help writing the implementations of quote
specific to
Larceny, Ikarus, and PLT, respectively. Support was provided by the
National Science Foundation under Grant #0937060 to the Computing
Research Association for the CIFellow Project.
Copyright (C) David Van Horn 2009. All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.