SRFI 101: Purely Functional Random-Access Pairs and Lists

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-101@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

Random-access lists [1] are a purely functional data structure for representing lists of values. A random-access list may act as a drop in replacement for the usual linear-access pair and list data structures (pair?, cons, car, cdr), which additionally supports fast index-based addressing and updating (list-ref, list-set). The impact is a whole class of purely-functional algorithms expressed in terms of index-based list addressing become feasible compared with their linear-access list counterparts.

This document proposes a library API for purely functional random-access lists consistent with the R⁶RS [2] base library and list utility standard library [3].

Issues

Procedure names have been chosen to be consistent with R6RS, even though in some cases such as list-ref and list-tail the choice seems poor since they include the prefix list- even though they do not operate on lists, but chains of pairs, i.e. lists and improper lists, and arbitrary objects, respectively. Although the names have remained the same, the descriptions have been corrected (e.g. using pair or obj instead of list for parameter names). Should the names be changed as well?
To what extent should standard Scheme procedures and syntax that consume or construct lists be included in this proposal? For example, should all of the (rnrs base) library that deals with lists be included? By my count this would mean adding: lambda, apply, vector->list, list->vector, string->list, and list->string. I am inclined to add these. Should all of the (rnrs lists) library be included? These procedures are easily defined in terms of what's given here, and no perfomance advantage is gained by implementig them "under the hood" using the data structures in the reference implementation. I am inclined not to include them.
Should a car+cdr procedure be added?
Should the current syntax and procedures sub-libraries be included?

Rationale

Functional programming and list hacking go together like peanut butter and jelly, eval and apply, syntax and semantics, or cursing and recursing. But the traditional approach to implementing pairs and lists results in index-based access (list-ref) requiring time proportional the index being accessed. Moreover, indexed-based functional update (list-set) becomes so inefficient as to be nearly unspeakable. Instead, programmers revert the imperatives of the state; they use a stateful data structure and imperative algorithms.

This SRFI intends to improve the situation by offering an alternative implementation strategy based on Okasaki's purely functional random-access lists [1]. Random-access pairs and lists can be used as a replacement for traditional, linear-access pairs and lists with no asymptotic loss of efficiency. In other words, the typical list and pair operations such as cons, car, and cdr, all operate in O(1) time as usual. However, random-access lists additionally support index-based access and functional update operations that are asymptotically cheaper; O(log(n)) for random-access lists versus O(n) for linear-access lists, where n is the length of the list being access or updated. As such, many purely functional index-based list algorithms become feasible by using a random-access list representation for pairs and lists.

The requirements of this SRFI have been designed in such a way as to admit portable library implementations of this feature, such as the reference implementation, while at the same time admit more radical implementations that embrace random-access pairs as the fundamental pair representation.

Specification

Random-access pairs and lists

A random-access pair (or just pair) is a compound structure with two fields called the car and the cdr fields (consistent with the historical naming of pair fields in Scheme). Pairs are created by the procedure cons. The car and cdr fields are accessed by the procedures car and cdr.

Pairs are used primarily to represents lists. A list can be defined recursively as either the empty list or a pair whose cdr is a list. More precisely, the set of lists is defined as the smallest set X such that

The objects in the car fields of successive pairs of a list are the elements of the list. For example, a two-element list is a pair whose car is the first element and whose cdr is a pair whose car is the second element and whose cdr is the empty list. The length of a list is the number of elements, which is the same as the number of pairs.

The empty list is a special object of its own type. It is not a pair. It has no elements and its length is zero.

A chain of pairs is defined recursively as either a non-pair object or a pair whose cdr is a chain of pairs (Note: every value is a chain of pairs). A chain of pairs ending in the empty list is a list. A chain of pairs not ending in the empty list is called an improper list. Note that an improper list is not a list. Whether a given pair is a list depends upon what is stored in the cdr field.

The external representation of pairs is not specified by this SRFI, however the examples below do use the typical notation for writing pair and list values.

Random-access pairs and lists are specified to be fully functional, or, to use the term from the academic literature, fully persistent [1]. Full persistence means that all operations on random-access lists, notably including cons, list-ref, list-set, and list-ref/update, are specified

It is usually taken for granted that standard Scheme lists have these properties. This SRFI explicitly specifies that random-access lists share them.

Semantics: (quote <datum>) evaluates to the datum value represented by <datum> (see section 4.3 of R6RS). This notation is used to include constants.

When the datum value represented by <datum> contains pair structure, quote produces random-access pairs.

As noted in section 4.3.5 of R6RS, (quote <datum>) may be abbreviated as '<datum>:

The equal? predicate returns #t if and only if the (possibly infinite) unfoldings of its arguments into regular trees are equal as ordered trees.

The equal? predicate treats pairs and vectors as nodes with outgoing edges, uses string=? to compare strings, uses bytevector=? to compare bytevectors, and uses eqv? to compare other nodes.

Returns #t if obj is a pair, and otherwise returns #f. This operation must take O(1) time.

Returns a newly allocated pair whose car is obj₁ and whose cdr is obj₂. The pair is guaranteed to be different (in the sense of eqv?) from every existing object. This operation must take O(1) time.

Returns the contents of the car field of pair. This operation must take O(1) time.

Returns the contents of the cdr field of pair. This operation must take O(1) time.

procedure: (caar pair) → obj
procedure: (cadr pair) → obj
...
procedure: (cdddar pair) → obj
procedure: (cddddr pair) → obj

These procedures are compositions of car and cdr, where for example caddr could be defined by
(define caddr (lambda (x) (car (cdr (cdr x))))).

Arbitrary compositions, up to four deep, are provided. There are twenty-eight of these procedures in all. These operations must take O(1) time.

Returns #t if obj is the empty list, #f otherwise. This procedure is equivalent to the null? procedure of the R6RS base library.

Returns #t if obj is a list, #f otherwise. By definition, all lists are chains of pairs that have finite length and are terminated by the empty list. This operation must take time bounded by O(log(n)), where n is the number of pairs in the chain forming the potential list.

Returns a newly allocated list of its arguments. This operation must take time bounded by O(n), where n is the number of arguments to list.

Returns a newly allocated list of k elements. If a second argument is given, then each element is initialized to obj. Otherwise the initial contents of each element is unspecified. This operation must take time and space bounded by O(log(k)).

Returns the length of list. This operation must take time bounded by O(log(n)), where n is the length of the list.

Returns true if obj is a chain of at least k pairs and false otherwise. This operation must take time bounded by O(log(min(k,n))), where n is the length of the chain of pairs.

Returns a chain of pairs consisting of the elements of the first list followed by the elements of the other lists, with obj as the cdr of the final pair. An improper list results if obj is not a list. This operation must take time bounded by O(log(n)), where n is the total number of elements in the given lists.

Returns a newly allocated list consisting of the element of list in reverse order. This operation must take time bounded by O(n) where n is the length of the list.

Obj should be a chain of pairs with a count of at least k. The list-tail procedure returns the object obtained by omitting the first k elements in obj. This operation must take time bounded by O(log(min(k,n))), where n is the length of the chain of pairs.

Implementation responsibilities: The implementation must check that obj is a chain of pairs whose count is at least k.

Pair must be a chain of pairs whose count is at least k + 1. The list-ref procedure returns the kth element of pair. This operation must take time bounded by O(min(k,log(n))), where n is the length of the chain of pairs.

Implementation responsibilities: The implementation must check that pair is a chain of pairs whose count is at least k + 1.

Pair must be a chain of pairs whose count is at least k + 1. The list-set procedure returns the chain of pairs obtained by replacing the kth element with obj. This operation must take time bounded by O(min(k,log(n))), where n is the length of the chain of pairs.

Implementation responsibilities: The implementation must check that pair is a chain of pairs whose count is at least k + 1.

The lists should all have the same length. Proc should accept as many arguments as there are lists and return a single value.

The map procedure applies proc element-wise to the elements of the lists and returns a list of the results, in order. Proc is always called in the same dynamic environment as map itself. The order in which proc is applied to the elements of the lists is unspecified.

Implementation responsibilities: The implementation should check that the lists all have the same length. The implementation must check the restrictions on proc to the extent performed by applying it as described. An implementation may check whether proc is an appropriate argument before applying it.

The lists should all have the same length. Proc should accept as many arguments as there are lists.

The for-each procedure applies proc element-wise to the elements of the lists for its side effects, in order from the first element to the last. Proc is always called in the same dynamic environment as for-each itself. The return values of for-each are unspecified.

Representation conversion

procedure:

(random-access-list->linear-access-list ra-list)
→ la-list

procedure:

(linear-access-list->random-access-list la-list)
→ ra-list

These procedures convert between (potentially) distinct representations of lists. To avoid confusion, parameters named ra-list range over lists represented with random-access lists, i.e. objects satisfying the list? predicate described above, while parameters named la-list range over lists represented with the more traditional linear-access lists, i.e. objects satisfying the list? predicate of R6RS. In systems that represent all lists as random-access lists, these conversions may simply be list identity procedures.

Implementation requirements

Random-access pairs must be disjoint from all other base types with the possible exception of (linear-access) pairs.

The external representation of random-access pairs is unspecified. The behavior of equal? when given a random-access pair and a sequential-access pair is unspecified in implementations with disjoint representations.

The behavior of eq? and eqv? on random-access pairs must be the same as that for pairs, vectors, or records. Namely, two random-access pair objects are eq? if and only if they are eqv?, and they are eqv? if and only if they refer to the same location in the store.

All argument checking for each operation must be done within the time bounds given for that operation.

Implementations are encouraged, but not required, to support random-access pairs and lists as their primary pair and list representation. In such an implementation, the external representation of random-access pairs and list should be as described in section 4.3.2 (Pairs and lists) of R⁶RS, the behavior of equivalence predicates on random-access pairs should be as described in section 11.5 (Equivalence predicates) of R⁶RS, and so on. In short, all pairs should be random-access pairs.

Implementations supporting SRFI Libraries [4] and SRFI 101 must provide the following libraries:

Reference Implementation

A portable R6RS library reference implementation and test suite are provided. The library has been tested successfully using Ikarus (0.0.3), Larceny (0.97), and PLT Scheme (4.2.1.7).

A PLT Scheme specific library that implements purely functional random-access lists, but with an API designed to be consistent with PLT's list libraries rather than R6RS, is available as the RaList package on Planet [5].

References

Acknowledgments

I am grateful to the members of the Northeastern University Programming Research Laboratory and PLT (and their intersection) for discussions during the pre-draft development of this SRFI and the library that preceded it. We are all indebted to Okasaki for his elegant solution to this problem. Much of the specification is adapted from text intended to belong to the Scheme community; I thank the editors and authors of the RnRS series collectively for their efforts. I am grateful to Donovan Kolbly for serving as SRFI editor and to Taylor R Campbell, Robert Bruce Findler, Aubrey Jaffer, Shiro Kawai, and Alexey Radul for discussion during the draft period. I thank William D Clinger, Robert Bruce Findler, and Abdulaziz Ghuloum for help writing the implementations of quote specific to Larceny, Ikarus, and PLT, respectively. Support was provided by the National Science Foundation under Grant #0937060 to the Computing Research Association for the CIFellow Project.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Title

Author

Status

Table of contents