Title

Purely Functional Random-Access Pairs and Lists

Author

David Van Horn

Status

This SRFI is in ``final'' status. To see an explanation of each status that a SRFI can hold, see here. To provide input on this SRFI, please mail to <srfi minus 101 at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list. You can access post-finalization messages via the archive of the mailing list.

Table of contents

Abstract

Random-access lists [1] are a purely functional data structure for representing lists of values. A random-access list may act as a drop in replacement for the usual linear-access pair and list data structures (pair?, cons, car, cdr), which additionally supports fast index-based addressing and updating (list-ref, list-set). The impact is a whole class of purely-functional algorithms expressed in terms of index-based list addressing become feasible compared with their linear-access list counterparts.

This document proposes a library API for purely functional random-access lists consistent with the R6RS [2] base library and list utility standard library [3].

Issues

Rationale

Functional programming and list hacking go together like peanut butter and jelly, eval and apply, syntax and semantics, or cursing and recursing. But the traditional approach to implementing pairs and lists results in index-based access (list-ref) requiring time proportional the index being accessed. Moreover, indexed-based functional update (list-set) becomes so inefficient as to be nearly unspeakable. Instead, programmers revert the imperatives of the state; they use a stateful data structure and imperative algorithms.

This SRFI intends to improve the situation by offering an alternative implementation strategy based on Okasaki's purely functional random-access lists [1]. Random-access pairs and lists can be used as a replacement for traditional, linear-access pairs and lists with no asymptotic loss of efficiency. In other words, the typical list and pair operations such as cons, car, and cdr, all operate in O(1) time as usual. However, random-access lists additionally support index-based access and functional update operations that are asymptotically cheaper; O(log(n)) for random-access lists versus O(n) for linear-access lists, where n is the length of the list being access or updated. As such, many purely functional index-based list algorithms become feasible by using a random-access list representation for pairs and lists.

The requirements of this SRFI have been designed in such a way as to admit portable library implementations of this feature, such as the reference implementation, while at the same time admit more radical implementations that embrace random-access pairs as the fundamental pair representation.

Specification

Random-access pairs and lists

A random-access pair (or just pair) is a compound structure with two fields called the car and the cdr fields (consistent with the historical naming of pair fields in Scheme). Pairs are created by the procedure cons. The car and cdr fields are accessed by the procedures car and cdr.

Pairs are used primarily to represents lists. A list can be defined recursively as either the empty list or a pair whose cdr is a list. More precisely, the set of lists is defined as the smallest set X such that

The objects in the car fields of successive pairs of a list are the elements of the list. For example, a two-element list is a pair whose car is the first element and whose cdr is a pair whose car is the second element and whose cdr is the empty list. The length of a list is the number of elements, which is the same as the number of pairs.

The empty list is a special object of its own type. It is not a pair. It has no elements and its length is zero.

Note: The above definitions imply that all lists have finite length and are terminated by the empty list.

A chain of pairs is defined recursively as either a non-pair object or a pair whose cdr is a chain of pairs (Note: every value is a chain of pairs). A chain of pairs ending in the empty list is a list. A chain of pairs not ending in the empty list is called an improper list. Note that an improper list is not a list. Whether a given pair is a list depends upon what is stored in the cdr field.

The external representation of pairs is not specified by this SRFI, however the examples below do use the typical notation for writing pair and list values.

Random-access pairs and lists are specified to be fully functional, or, to use the term from the academic literature, fully persistent [1]. Full persistence means that all operations on random-access lists, notably including cons, list-ref, list-set, and list-ref/update, are specified

  1. not to mutate any of their arguments; perforce
  2. to be safe to execute concurrently on shared arguments; and
  3. to suffer no degradation of performance as a consequence of the history of operations carried out to produce their arguments (except as it is reflected in the lengths of those arguments); but permitted
  4. to produce results that share structure with their arguments.

It is usually taken for granted that standard Scheme lists have these properties. This SRFI explicitly specifies that random-access lists share them.

syntax: (quote datum)

Syntax: <Datum> should be a syntactic datum.

Semantics: (quote <datum>) evaluates to the datum value represented by <datum> (see section 4.3 of R6RS). This notation is used to include constants.

When the datum value represented by <datum> contains pair structure, quote produces random-access pairs.

(quote a)                                      ⇒ a
(quote #(a b c))                               ⇒ #(a b c)
(quote (+ 1 2))                                ⇒ (+ 1 2)

As noted in section 4.3.5 of R6RS, (quote <datum>) may be abbreviated as '<datum>:

'"abc"                                         ⇒ "abc"
'145932                                        ⇒ 145932
'a                                             ⇒ a
'#(a b c)                                      ⇒ #(a b c)
'()                                            ⇒ ()
'(+ 1 2)                                       ⇒ (+ 1 2)
'(quote a)                                     ⇒ (quote a)
''a                                            ⇒ (quote a)

As noted in section 5.10 of R6RS, constants are immutable.

Note: Different constants that are the value of quote expression may share the same locations.

procedure: (equal? obj1 obj2) → bool

The equal? predicate returns #t if and only if the (possibly infinite) unfoldings of its arguments into regular trees are equal as ordered trees.

The equal? predicate treats pairs and vectors as nodes with outgoing edges, uses string=? to compare strings, uses bytevector=? to compare bytevectors, and uses eqv? to compare other nodes.

(equal? 'a 'a)                                 ⇒ #t
(equal? '(a) '(a))                             ⇒ #t
(equal? '(a (b) c)
        '(a (b) c))                            ⇒ #t
(equal? "abc" "abc")                           ⇒ #t
(equal? 2 2)                                   ⇒ #t
(equal? (make-vector 5 'a)
        (make-vector 5 'a))                    ⇒ #t
(equal? '#vu8(1 2 3 4 5)
        (u8-list->bytevector
         '(1 2 3 4 5))                         ⇒ #t
(equal? (lambda (x) x)
        (lambda (y) y))                        ⇒ unspecified

(let* ((x (list 'a))
       (y (list 'a))
       (z (list x y)))
  (list (equal? z (list y x))
        (equal? z (list x x))))                ⇒ (#t #t)

procedure: (pair? obj) → bool

Returns #t if obj is a pair, and otherwise returns #f. This operation must take O(1) time.

(pair? '(a . b))                               ⇒ #t
(pair? '(a b c))                               ⇒ #t
(pair? '())                                    ⇒ #f
(pair? '#(a b))                                ⇒ #f

procedure: (cons obj1 obj2) → pair

Returns a newly allocated pair whose car is obj1 and whose cdr is obj2. The pair is guaranteed to be different (in the sense of eqv?) from every existing object. This operation must take O(1) time.

(cons 'a '())                                  ⇒  (a)
(cons '(a) '(b c d))                           ⇒  ((a) b c d)
(cons "a" '(b c))                              ⇒  ("a" b c)
(cons 'a 3)                                    ⇒  (a . 3)
(cons '(a b) 'c)                               ⇒  ((a b) . c)

procedure: (car pair) → obj

Returns the contents of the car field of pair. This operation must take O(1) time.

(car '(a b c))                                 ⇒  a
(car '((a) b c d))                             ⇒  (a)
(car '(1 . 2))                                 ⇒  1
(car '())                               &assertion exception

procedure: (cdr pair) → obj

Returns the contents of the cdr field of pair. This operation must take O(1) time.

(cdr '((a) b c d))                             ⇒  (b c d)
(cdr '(1 . 2))                                 ⇒  2
(cdr '())                               &assertion exception

procedure: (caar pair) → obj
procedure: (cadr pair) → obj
...
procedure: (cdddar pair) → obj
procedure: (cddddr pair) → obj

These procedures are compositions of car and cdr, where for example caddr could be defined by
(define caddr (lambda (x) (car (cdr (cdr x))))).

Arbitrary compositions, up to four deep, are provided. There are twenty-eight of these procedures in all. These operations must take O(1) time.

procedure: (null? obj) → bool

Returns #t if obj is the empty list, #f otherwise. This procedure is equivalent to the null? procedure of the R6RS base library.

procedure: (list? obj) → bool

Returns #t if obj is a list, #f otherwise. By definition, all lists are chains of pairs that have finite length and are terminated by the empty list. This operation must take time bounded by O(log(n)), where n is the number of pairs in the chain forming the potential list.

(list? '(a b c))                               ⇒  #t
(list? '())                                    ⇒  #t
(list? '(a . b))                               ⇒  #f

procedure: (list obj ...) → list

Returns a newly allocated list of its arguments. This operation must take time bounded by O(n), where n is the number of arguments to list.

(list 'a (+ 3 4) 'c)                           ⇒  (a 7 c)
(list)                                         ⇒  ()

procedure: (make-list k) → list
procedure: (make-list k obj) → list

Returns a newly allocated list of k elements. If a second argument is given, then each element is initialized to obj. Otherwise the initial contents of each element is unspecified. This operation must take time and space bounded by O(log(k)).

(make-list 5 0)                                ⇒  (0 0 0 0 0)

procedure: (length list) → k

Returns the length of list. This operation must take time bounded by O(log(n)), where n is the length of the list.

(length '(a b c))                              ⇒  3
(length '(a (b) (c)))                          ⇒  3
(length '())                                   ⇒  0

procedure: (length<=? obj k) → bool

Returns true if obj is a chain of at least k pairs and false otherwise. This operation must take time bounded by O(log(min(k,n))), where n is the length of the chain of pairs.

(length<=? 'not-a-list 0)                      ⇒  #t
(length<=? '(a . b) 0)                         ⇒  #t
(length<=? '(a . b) 1)                         ⇒  #t
(length<=? '(a . b) 2)                         ⇒  #f

procedure: (append list ... obj) → obj

Returns a chain of pairs consisting of the elements of the first list followed by the elements of the other lists, with obj as the cdr of the final pair. An improper list results if obj is not a list. This operation must take time bounded by O(log(n)), where n is the total number of elements in the given lists.

(append '(x) '(y))                             ⇒  (x y)
(append '(a) '(b c d))                         ⇒  (a b c d)
(append '(a (b)) '((c)))                       ⇒  (a (b) (c))
(append '(a b) '(c . d))                       ⇒  (a b c . d)
(append '() 'a)                                ⇒  a

procedure: (reverse list) → list

Returns a newly allocated list consisting of the element of list in reverse order. This operation must take time bounded by O(n) where n is the length of the list.

(reverse '(a b c))                             ⇒  (c b a)
(reverse '(a (b c) 'd '(e (f))))               ⇒  ((e (f)) d (b c) a)

procedure: (list-tail obj k) → obj

Obj should be a chain of pairs with a count of at least k. The list-tail procedure returns the object obtained by omitting the first k elements in obj. This operation must take time bounded by O(log(min(k,n))), where n is the length of the chain of pairs.

(list-tail '(a b c d) 0)                       ⇒  (a b c d)
(list-tail '(a b c d) 2)                       ⇒  (c d)
(list-tail 'not-a-list 0)                      ⇒  not-a-list

Implementation responsibilities: The implementation must check that obj is a chain of pairs whose count is at least k.

procedure: (list-ref pair k) → obj

Pair must be a chain of pairs whose count is at least k + 1. The list-ref procedure returns the kth element of pair. This operation must take time bounded by O(min(k,log(n))), where n is the length of the chain of pairs.

(list-ref '(a b c d) 2)                        ⇒  c

Implementation responsibilities: The implementation must check that pair is a chain of pairs whose count is at least k + 1.

procedure: (list-set pair k obj) → obj

Pair must be a chain of pairs whose count is at least k + 1. The list-set procedure returns the chain of pairs obtained by replacing the kth element with obj. This operation must take time bounded by O(min(k,log(n))), where n is the length of the chain of pairs.

(list-set '(a b c d) 2 'x)                     ⇒  (a b x d)

Implementation responsibilities: The implementation must check that pair is a chain of pairs whose count is at least k + 1.

procedure: (list-ref/update pair k proc) → obj1 obj2

Returns the same results as:

(values (list-ref pair k) 
        (list-set pair k (proc (list-ref pair k))))

but it may be implemented more efficiently.

(list-ref/update '(7 8 9 10) 2 -)              ⇒  9 (7 8 -9 10)

procedure: (map proc list1 list2 ...) → list

The lists should all have the same length. Proc should accept as many arguments as there are lists and return a single value.

The map procedure applies proc element-wise to the elements of the lists and returns a list of the results, in order. Proc is always called in the same dynamic environment as map itself. The order in which proc is applied to the elements of the lists is unspecified.

(map cadr '((a b) (d e) (g h)))                ⇒  (b e h)

(map (lambda (n) (expt n n))
     '(1 2 3 4 5))
                ⇒  (1 4 27 256 3125)

(map + '(1 2 3) (4 5 6))                       ⇒  (5 7 9)

(let ((count 0))
  (map (lambda (ignored)
         (set! count (+ count 1))
         count)
       '(a b)))                                ⇒  (1 2) or (2 1)

Implementation responsibilities: The implementation should check that the lists all have the same length. The implementation must check the restrictions on proc to the extent performed by applying it as described. An implementation may check whether proc is an appropriate argument before applying it.

procedure: (for-each proc list1 list2 ...) → unspecified

The lists should all have the same length. Proc should accept as many arguments as there are lists.

The for-each procedure applies proc element-wise to the elements of the lists for its side effects, in order from the first element to the last. Proc is always called in the same dynamic environment as for-each itself. The return values of for-each are unspecified.

(let ((v (make-vector 5)))
  (for-each (lambda (i)
              (vector-set! v i (* i i)))
            '(0 1 2 3 4))
  v)                                           ⇒  #(0 1 4 9 16)

(for-each (lambda (x) x) '(1 2 3 4))           ⇒  unspecified
(for-each even? '())                           ⇒  unspecified

Implementation responsibilities: The implementation should check that the lists all have the same length. The implementation must check the restrictions on proc to the extent performed by applying it as described. An implementation may check whether proc is an appropriate argument before applying it.

Note: Implementations of for-each may or may not tail-call proc on the last element.

Representation conversion

procedure: (random-access-list->linear-access-list ra-list) → la-list
procedure: (linear-access-list->random-access-list la-list) → ra-list

These procedures convert between (potentially) distinct representations of lists. To avoid confusion, parameters named ra-list range over lists represented with random-access lists, i.e. objects satisfying the list? predicate described above, while parameters named la-list range over lists represented with the more traditional linear-access lists, i.e. objects satisfying the list? predicate of R6RS. In systems that represent all lists as random-access lists, these conversions may simply be list identity procedures.

Implementation requirements

Random-access pairs must be disjoint from all other base types with the possible exception of (linear-access) pairs.

The external representation of random-access pairs is unspecified. The behavior of equal? when given a random-access pair and a sequential-access pair is unspecified in implementations with disjoint representations.

The behavior of eq? and eqv? on random-access pairs must be the same as that for pairs, vectors, or records. Namely, two random-access pair objects are eq? if and only if they are eqv?, and they are eqv? if and only if they refer to the same location in the store.

All argument checking for each operation must be done within the time bounds given for that operation.

Implementations are encouraged, but not required, to support random-access pairs and lists as their primary pair and list representation. In such an implementation, the external representation of random-access pairs and list should be as described in section 4.3.2 (Pairs and lists) of R6RS, the behavior of equivalence predicates on random-access pairs should be as described in section 11.5 (Equivalence predicates) of R6RS, and so on. In short, all pairs should be random-access pairs.

Implementations supporting SRFI Libraries [4] and SRFI 101 must provide the following libraries:

    (srfi :101)                                 ; Composite libraries
    (srfi :101 random-access-lists)

    (srfi :101 random-access-lists procedures)  ; Procedures only
    (srfi :101 random-access-lists syntax)      ; Syntax only

Reference Implementation

A portable R6RS library reference implementation and test suite are provided. The library has been tested successfully using Ikarus (0.0.3), Larceny (0.97), and PLT Scheme (4.2.1.7).

A PLT Scheme specific library that implements purely functional random-access lists, but with an API designed to be consistent with PLT's list libraries rather than R6RS, is available as the RaList package on Planet [5].

References

[1] Purely Functional Random-Access Lists
Chris Okasaki, Functional Programming Languages and Computer Architecture, June 1995, pages 86-95.
http://www.eecs.usma.edu/webs/people/okasaki/pubs.html#fpca95
[2] Revised6 Report on the Algorithmic Language Scheme
Michael Sperber, et al. (Editors)
http://www.r6rs.org/
[3] Revised6 Report on the Algorithmic Language Scheme, Standard Libraries
Michael Sperber, et al. (Editors)
http://www.r6rs.org/
[4] SRFI 97: SRFI Libraries
David Van Horn
http://srfi.schemers.org/srfi-97/
[5] PLaneT: Purely Functional Random-Access Lists
David Van Horn
http://planet.plt-scheme.org/display.ss?package=ralist.plt&owner=dvanhorn

Acknowledgments

I am grateful to the members of the Northeastern University Programming Research Laboratory and PLT (and their intersection) for discussions during the pre-draft development of this SRFI and the library that preceded it. We are all indebted to Okasaki for his elegant solution to this problem. Much of the specification is adapted from text intended to belong to the Scheme community; I thank the editors and authors of the RnRS series collectively for their efforts. I am grateful to Donovan Kolbly for serving as SRFI editor and to Taylor R Campbell, Robert Bruce Findler, Aubrey Jaffer, Shiro Kawai, and Alexey Radul for discussion during the draft period. I thank William D Clinger, Robert Bruce Findler, and Abdulaziz Ghuloum for help writing the implementations of quote specific to Larceny, Ikarus, and PLT, respectively. Support was provided by the National Science Foundation under Grant #0937060 to the Computing Research Association for the CIFellow Project.

Copyright

Copyright (C) David Van Horn 2009. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. REMEMBER, THERE IS NO SCHEME UNDERGROUND. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Donovan Kolbly

Valid XHTML 1.0!