Title

Collections

Author

Scott G. Miller

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-44@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2003-04-23
Draft: 2003-04-28--2003-07-28
Revised: 2003-07-02
Revised: 2003-07-24
Revised: 2003-08-10
Revised: 2003-08-19
Revised: 2003-08-27
Revised: 2003-09-11
Revised: 2003-10-07
Revised: 2003-10-08
Revised: 2003-11-24
Revised: 2003-11-28
Final: 2004-03-07

Abstract

A Collections API which defines a common naming scheme and set of operations for creating, accessing, and manipulating common datastructures in Scheme. The API defines accessors, a common protocol for value access via generic and specific enumeration, and functions for inter-datastructure cooperation. Finally, a concrete specification of a compliant set of operators for the standard Scheme heterogenous datastructures (lists and vectors) and for the homogeneous Scheme string is provided.

Issues
Rationale
Specification

Collections Hierarchy
Storage Model
Dispatch Mechanism
Typing
Collection Attributes

Ordered Collections
Directional Collections
Purely Mutable Collections
Limited Collections
Folding Enumerators

Enumeration Stability
Equivalence
Immutable Collections
Homogeneity
Size versus Length
Infinite Collections
Ordered Collections
Bags, Sets, Maps and Value Equality
Sequences

Flexible Sequences
Procedures

Functional, Linear Update, and Purely Mutable Collections
Enumeration
Collections
Ordered Collections
Directional Collections
Limited Collections
Purely Mutable Collections
Bags
Sets
Sequences
Flexible Sequences
Maps

Scheme Collections

Lists
Association List Maps
Vectors
Strings

Compatibility

Extensions to R5RS
Relation to other SRFIs
Relation to Scheme implementations
Implementation
Future Work
References

Issues

Several functions in this SRFI are required to take arbitrary collections or major collection subtypes as input. This requires that the function in question be able to determine the specific type of the collection and dispatch to an appropriate implementation.

As standard Scheme lacks the ability to automatically dispatch on different function behaviors based on the types of arguments, it is difficult to devise a portable implementation of this SRFI which allows arbitrary future collections. It is expected that Scheme system implementors will take advantage of generic function, module system, or object-oriented features of their systems to implement this SRFI efficiently. At the same time it is hoped that a future SRFI will specify a mechanism which will allow a portable and efficient implementation of collections to exist. The reference implementation in this SRFI compromises by using the portable Tiny-CLOS object system as a framework for collection-function dispatch.

It should finally be noted that this SRFI does not require that all function calls to collection supertypes remain capable of dynamic dispatch. If a compiler or optimizer can infer the types of a collection function's arguments, or a user provides hints to that effect, it is permitted to monomorphize the call. That is, to statically compile a more specific but semantically equivalent collection function in place.

Rationale

The Scheme language provides two basic datastructures for containing any first-class value. Scheme pairs, and by extension lists, provide a variable sized datastructure with constant time extension and linear time access. Scheme vectors provide a fixed sized datastructure with typically constant time access. From these two datastructures it is possible to build up many other types of datastructures, including hash tables, trees, and sets.

However, no such datastructures are specified by the Scheme standard, and preceding this SRFI, no attempt has been made to specify a standard API for these most common datastructures. It is anticipated that this will change in the near future. In advance of these specifications, this SRFI will outline a consistent naming scheme and set of operators and semantics that future data structure specifications may follow. The intended result is to allow current and future standard datastructures to look and behave in a predictable fashion, and to allow the values stored in them to be easily and efficiently transferred to and from different datastructures as necessary.

This SRFI specifies a folding enumerator as the primary means of traversing and obtaining all values of a collection. The rationale for this paradigm is to support a diverse variety of collections whose contents may reside in memory, slow secondary storage, over networks, or as results of computational processes. The folding enumerator allows for precise control over the resources of the underlying collection, while still providing flexible generic access to collections.

Allowing stepwise enumeration over a collection, as opposed to a collection->list function allows the collection enumerator to fetch its elements lazily from some slower or more costly storage or retrieval mechanism. The canonical example is a collection whose elements are rows from a complex database query. While it may not be possible to store the entire results of the query in memory on which a map or for-each operation could be performed, it may be much more feasible to allow the enumerator to fetch each row one at a time, saving both memory and delay in transfering the results to the Scheme program. Additionally, the enumerator may cooperate much more cleanly with a garbage collector as references to each value retrieved can be discarded after processing, allowing the garbage collector to free large values immediately if the Scheme program does not itself capture a reference to the collection value. It has also been shown that enumeration can be the basis for other traversal protocols such as cursors, through straightforward conversion operators. Finally, the enumeration protocol described allows for early termination by the accessing function, which may obviate the need to transfer many unaccessed future elements.

Specification

Collections Hierarchy

Collection | +--------+---------+ | | | Bag Set Map | Sequence | Flexible Sequence

This SRFI first defines a hierarchy of possible datastructure types. Each subtype inherits the behavior and functionality of its parent (its interfaces). Any collection may be immutable.

Collection

The base type. All collections have a concept of size and emptiness, and it is possible to perform a folding enumeration over the values of any collection.

Bag

A bag is a collection of possibly ordered, possibly unique values. Given a bag, it is only possible to determine if it contains or does not contain any of a given value.

Set

A set is a collection of unique values. Similar to but distinct from a bag, it is only possible to determine if a set contains or does not contain a given value. Unlike a bag, only one of any value can exist in the collection at a time. Removing one value from the set means that the set no longer contains the value at all.

Sequence

A sequence extends a bag with the ability to access and replace all of its members using a contiguous sequence of non-negative exact integers. Sequences are stable. That is, if no mutation operation occurs on a sequence, its elements must remain in the same order according to their previously observed indices. When values are added to a base Sequence, they may be added anywhere in the sequence. Sequences may or may not allow multiple instances of a value.

Flexible Sequence

A flexible sequence is a datastructure that allows new values to be added and old values removed at any position in the sequence, as specified by the programmer.

Map

A map is a collection which stores associations between single keys and single values. Maps typically provide better than linear time retrieval of the value given a key.

Typing

This SRFI uses the word type to refer to a class of datastructures as a whole, and subtyping to refer to classes of datastructures who have the properties of and pass the type-checking predicates of their parents. This however is a metaphor used to more concisely describe the signatures of operators that each concrete collection must provide. It does not necessarily map to any actual type system or inheritance rules. Implementations are permitted to use type systems to enforce or facilitate providing the correct set of operators and behaviors, but this is not required.

Collection Attributes

In addition to the conceptual type hierarchy which dictates the available functions for a given collection, collections also may possess any of four (in this SRFI) attributes which specify whether certain global functions are well defined for the collection. Attributes may also specify additional parameters to the collection's constructor.

This SRFI specifies four such attributes: orderedness, directionality, mutability, and limitedness.
Ordered Collections

Ordered collections maintain an ordering to their values (or in the case of maps, their keys) which is based on those values' relationships to each other. The collection fold is guaranteed to enumerate over the collection's values/keys in increasing value precedence as defined by the collection's ordering function.

Directional Collections

Directional collections have a notion of a left/right handedness to their interface which is independent of any relationships between their values. For example, sequences are always directional collections, for which the left side implies lower indices, while the right side implies higher indices. Directional collections must provide operators to retrieve the left and rightmost values, and may provide update operators to add or delete values to/from the left and right hand sides of the collection. This is useful for non-sequences as well, for example to implement a queue datastructure as a bag.

The collection left fold operator must enumerate over the values of a directional collection from left to right order as defined by the directional collection, and the right fold must correspondingly enumerate from right to left.

A collection cannot be both ordered and directional simultaneously. The left and right accessors of ordered collections access least-precedent and most-precedent values, rather than the left and right values which are unrelated to precedence in directional collections.

Limited Collections

Limited collections are collections with a fixed capacity and thus a fixed maximum size. It may be possible to add and remove values from limited collections by using a special value to indicate unused slots in the underlying representation, so limited collections do not necessarily have fixed size. Update procedures which add or remove values from a limited collection may be undefined on limited collections.

Purely Mutable Collections

This SRFI's API is accommodating to purely functional and linear-updating collections by default. Some collections may be purely mutable. That is, updates are done purely through side effects within the collection. In such a collection,
(eq? <input collection> (update-procedure  <input collection>))
will always return #t. Much Scheme code is often written to such purely mutable collections (hash tables are an example). It is not possible to support these collections by requiring that code use the side-effecting procedures defined in this SRFI in the linear-update fashion, as it would require significant changes to such code and could be expensive in terms of unnecessarily allocated storage.

For these reasons, pure mutability is an additional attribute that may be held by a collection. When so held, the results of the side-effecting update procedures may be ignored, Programmers may then utilize these procedures in the usual manner and agnostic to the collection type so long as the pure mutability attribute is checked. Note that purely mutable collections exist comfortably within the protocol for linear update procedures, so code written to the latter will accomodate the former with no changes.
Storage Model

Collection instances, as specified in this SRFI, must be treated as first-class values. This means they must be storable in any variable or heterogenous datastructure. Collections may be divisible, that is, may be composed as combinations of like-typed collections. This SRFI defines modification operators for both purely functional and any range of mutable datastructures.

Dispatch Mechanism

The precise mechanism of dispatch for collection operators capable of receiving multiple collection types is deliberately unspecified. Many existing Scheme systems have generic procedure functionality and/or object oriented extensions that permit dispatch of functions based on the type of their arguments. In other cases, self-contained programs may wish to include a small, fixed set of collections and eschew the overhead of a generic dispatch system, instead relying on simple dispatch using explicit type checking (for example with a cond statement). This SRFI is designed to allow integration with any of those systems. Should a standard dispatch system emerge in the future, this SRFI can be cleanly integrated with it.

In the case where a generic dispatch system is used, it needs only a minimum set of features. In particular, a single dispatch system should be sufficient to implement collection extension, as all operators where more than one generic type may be passed can be implemented in terms of generic dispatch on the first argument and utilization of a second generic operator for its implementation. For example, *-add-from need only dispatch on the primary collection, and can use the fold operator to iterate through the second collection's values to add them.

Future SRFIs may add additional abstract collection types, but such types must inherit their interface singly from an existing abstract type (including the base collection type) in this SRFI. Concrete collection types can inherit from an abstract type or other concrete types, but abstract types may not inherit their interface from a concrete type. For example, a new dictionary abstract type may yield #t when applied from dictionary??, and map? but must not for alist-map?.

Folding Enumerators

Regardless of the collection type, all collections must be supported by the generic folding enumeration procedures, collection-fold-left and collection-fold-right in addition to providing concrete fold operators specific to the collection type (%-fold-left, %-fold-right) (see Procedures). The collection fold procedures take a collection and a folding function as arguments, as well as any number of optional seed values. The collection fold procedures then invoke the folding function on a value of the collection and the seeds, and receive from the folding function a proceed value followed by a like number of seed return values. If the proceed value is non-false, the collection fold procedure then invokes the same fold function again with the next collection value and the newly returned seed values. This continues until the elements of the collection are exhausted, or the fold function returns false as the proceed value, at which point the collection fold procedure returns the seed values (if any) returned by the last call to the fold function.
In an ordered collection, the left fold procedure is required to apply the values in the order of the underlying ordered collection. In addition, an ordered collection must apply the values in the reverse order as the left fold when enumerated by collection-fold-right. The left fold procedure must proceed left to right through the values of a directional collection such as a sequence (increasing index order), while the right fold proceeds right to left (decreasing index order.)

Finally, all Map and Sequence datastructures must be supported by collection-fold-keys-left and collection-fold-keys-right which during enumeration applies the fold function to both the map's keys or the sequence's indices and their corresponding values.

Enumeration Stability

It is undefined in this SRFI whether the process of enumeration over a collection is stable. A stable enumeration will enumerate over the values that exist in a given collection at a certain time. Removing, adding, or changing values from the underlying collection while enumerating will not cause those values to be missing, discovered, or the new value observed respectively in future steps of the enumeration. Modifying a collection while enumerating is permitted to cause an error in either the collection modification or in the enumeration, though this behavior is discouraged.

Enumerations must, however, be stable in the absence of updates. That is, once started, an enumeration should touch all values in the collection (in the proper order as described above) so long as no updates are performed on the underlying collection. Enumerations need not be stable between each other. That is, though an enumeration may be internally stable, a second enumeration need not return the values in the same order, unless the collection type and attributes require this as described above.

Note that if a collection is purely functional, it will by definition be stable in the presence of modification, as the modified collection will be space-distinct from the enumerated collection.

Equivalence

Collections are considered equivalent with respect to Scheme's equal? operator when they are of the same type and contain a like number of values or mappings, and where each value/mapping in one collection is equal? to a value in the second collection.

For sequences and ordered collections the ordering of the contained values must also be equivalent. For maps, each key in the first map must be equal to a key in the second map, and the value(s) mapped to by that key in each map must also be equivalent. If the map is ordered, the order of the mappings must also be the same.

Equivalence is checked with this SRFI's *= operator, described later. Implementations may also extend Scheme's equal? and eqv? operators to collections, as long as the above semantics hold for equal?. In other words,
(equal? collection ...)
        
must return the same value as
(collection= equal? collection ...)
        
if the collections are of the same type, otherwise it must return false.

Immutable Collections

Any collection or instance of a collection may be immutable, or made immutable at any point in its lifecycle. It is an error to add, remove, or modify any values or mappings in an immutable collection.

Homogeneity

Collections may be homogeneous (capable of storing values of only one type), though it is anticipated that the majority of collections will be heterogenous. If a collection is homogeneous, it is an error to attempt to store a value or key of the wrong type within it.

Size versus Length

Most collections possess a concept of size. The size of a collection is the number of values or mappings it currently contains. This differs from the concept of length in Scheme datastructures, which corresponds to the number of cons cells or vector slots the structure contains. A collection may contain more cells or slots than required to contain its values or mappings. An example might be a hashtable collection, which may at any given time contain numerous unoccupied, discontiguous cells. This matter is confused by the collections specified in this API, whose size and length are the same.

Put another way, collection-size should return the number of enumeration steps that will occur, if known.

A collection may not have such a concept, in which case the collection-size function must return #f. Infinite collections (such as the collection of natural numbers) or lazy collections (such as a network stream) are examples of size-indeterminate collections.

Infinite Collections

Infinite collections are collections which contain or generate a potentially unlimited number of values. Infinite collections must return #f from the collection-size function. Enumerations over an infinite collection may proceed indefinitely, and may be haltable only via the folding function.

Furthermore, left enumeration over an infinite collection should whenever possible not require an unboundedly increasing amount of storage as the enumeration proceeds. It should be possible to enumerate indefinitely without consuming a fatal amount of resources assuming the fold function does not contribute to resource consumption. The behavior of collection-fold-right is undefined on infinite collections. Furthermore, the collection fold operators are strict. Applying a fold operator to a stream will force each of its values as they are applied to the fold function.

Some collections may use a self-referential representation, such as the circular list for redundant value access. The purpose of these collections is not to appear to contain an infinite number of values, and are thus not considered infinite. Naive iteration may proceed indefinitely, but the folding enumerations of this SRFI must halt after enumerating over the actual values of such a collection. collection-size should return the number of actual values in the collection unless it cannot efficiently do so.

Ordered Collections

Some collections maintain an ordering. These ordered collections provide the additional property of guaranteeing a left fold will progress over the collection in a least to greatest value precedence fashion, as defined by the collection's ordering function. An ordered collection's constructor must take such a function as input. An ordering function takes two arguments, and returns a boolean, indicating whether the first value is to take precedence in the collection over the second. As an example, an ordered collection of numbers may use < or <= to order numeric values added to the collection.

In order to ensure a consistent ordering in the collection, the ordering function must return the same result for like inputs over time (i.e. it must be functional). In most cases, an ordering function should also treat like values as if tested using equal? in order to ensure that duplicate values are stored and retrieved consistently.

Many collections may derive the equivalence function from the ordering function since the ordering function has the property of being strict weak. It is suggested that ordered collections do so if no equivalence function is provided by the programmer. Otherwise the collection should document its behavior in this respect.

Bags, Sets, Maps and Value Equality

Maps store mappings from keys to values. Bags and Sets store values in an opaque fashion which only indicates the presence or absence of an object. In order to determine whether a given value exists in a set or bag, or whether a given key matches another in a map, these collections must use an equivalence function. Other collections may use equivalence functions for other purposes.

Any collection may require or accept an equivalence function in its primary constructor (make-%) and in its initializing constructor (%). The provided equivalence function must take two values as input and return true if they should be considered equivalent for the purposes of contains?, value insertion, or removal. If provided, an equivalence function follows an ordering function in an ordered collection, but precedes any other arguments. If not provided, eqv? is used as a default equivalence function.

Maps additionally may take a second equivalence function for comparing keys. Thus, the initializing constructor for a map which accepts user-defined equivalence functions takes the form (% [value-equivalence-function [key-equivalence-function]] ...) . If not provided, the map must use the value equivalence function for comparing values and keys.

Sequences

A sequence is a directional collection of values named by a contiguous sequence of exact integers in the range [0,(collection-size sequence)). Scheme vectors are a natural example of a sequence. Sequences may or may not be limited. If they are not, sequences allow new values to be added at an undefined point in the sequence, after which some value will exist at the index (- (collection-size sequence) 1).

Sequences may allow other indices outside the range specified above to retrieve values that can also be retrieved with indices inside the normal range. For example, a circular list sequence may allow you to retrieve the last value at position (- (collection-size seq) 1) and at position -1. Such extensions must be well documented in the specification of that sequence.

Flexible Sequences

Flexible sequences are sequences which allow insertion of values at arbitrary points in the sequence. Inserting a value at any position except the end of the sequence causes all values with higher indices than the insertion point to shift right, thus having an index increased by one. Similarly, if values are removed from a sequence at any position except the end, all values with higher indices are shifted left, so their former indices are now decreased by one.

Procedures

Below we describe the naming conventions and semantics of Collections API functions, grouped by collection type. In each function, the name of the collection would replace the percentage mark in the function prototype. Function definitions in child collection types or in attributes absolutely override the same named function in the parent. For example make-% in the ordered collection requires an equivalence function while general collection constructors do not.

When * is encountered in the definitions below, it is implied that the asterisk is replaced with a function for the name of the section type and all of its subtypes. For example, if we had a 'list' flexible sequence collection, the functions list-contains?, flexible-sequence-contains?, sequence-contains?, bag-contains? must all exist, but collection-contains? does not. In addition, it is an error to apply any such function to a collection whose type does not satisfy that implied by the function name.

Encountering * as a function argument indicates that the argument must be a collection of the type the function is defined for, or any sub-type. These functions are polymorphic on the input collection type.

When % is encountered in the definitions below, the actual name of the collection is implied. Again, assuming a 'list' flexible sequence, make-% implies that the function make-list exists. % as a return value from a constructor indicates a collection of that specific type.

Encountering % as a return type indicates that a collection of the same type as the input is returned.

Occasionally an operator will be specified in a collection subtype which was already defined with the same signature in a supertype or by an attribute which the collection subtype is known to support. This is done to clarify the semantics of the operator as defined for that specific subtype.

Finally, some procedures defined below indicate more than one item as a return value (following => in the definition.) These procedures do in fact return more than one value as multiple Scheme values. These values are returned as if by the values Scheme function, not as a Scheme list.
Functional, Linear Update, and Purely Mutable Collections

Functions in this SRFI which modify a collection are provided in two flavors. Both must be implemented by collections. Purely functional updating functions must not side effect the input collection, but must return a new collection which reflects the update. The returned collection may share structure with the input collection, but the effects of the update must not be reflected in the input collection.

The linear update/purely mutable update operators share the name of the functional update function but have the bang (!) character appended. They too return a collection, which is allowed to be the same as and/or share structure with the input collection. However, side-effects to the input collection are allowed. The caller must use the returned collection to view the effects of the update. The structure of the input collection after the modification is undefined.

Purely mutable collections will only side effect the input collection. They will therefor always return the input collection as the output collection. The programmer may ignore the resulting value if she is certain that the collection is purely mutable. Operating on the input collection after an update to a linear updating collection may have unexpected results.

A collection may naturally be amenable to either purely functional or purely side-effected updates. In the former case, the linear updating version of the procedure may return the purely functionally updated collection. This does not conflict with the definition of the linear update procedure. Conversely, the purely functional updating function can return a distinct collection by cloning a collection which cannot be functionally updated, and performing the side-effecting modifications to the clone of the input collection.

As the case of functionally updating a collection whose structure is updatable only using side-effects can be expensive (due to the worst-case need to clone a large collection), the specifications of all collections are highly encouraged to document the nature of their compatibility and efficiency for both the functional and side-effecting update functions.

Enumeration
procedure: collection-fold-left collection fold-function seed ... => seed ...
procedure: %-fold-left % fold-function seed ... => seed ...

fold-function is a procedure which accepts one more than the number of seed values. The function accepts a single collection value as its first argument, and the seeds as remaining arguments. It must then return a proceed value, which if false halts the enumeration, as well as an equal number of returned seed values as arguments. These seed values are then passed to the next call to the fold function on the next collection value.

When the collection values are exhausted or a false proceed value is received from the fold function, the enumeration ceases and the fold operator returns the last set of seed values returned by the fold-function.
procedure: collection-fold-right collection fold-function seed ... => seed ...
procedure: %-fold-right % fold-function seed ... => seed ...

Behaves like collection-fold-left, except that the fold function is applied to the values of the collection in reverse order.
procedure: collection-fold-keys-left collection fold-function seed ... => seed ...
procedure: %-fold-keys-left % fold-function seed ... => seed ...

Behaves like collection-fold-left, but enumerates over the keys or indices of a map or sequence respectively as well as the corresponding values. This procedure is only defined for sequences and maps.

Also, the fold function of a key enumerator must accept two more operands than the number of seed values. The first two operands to the function are the key and corresponding value at the current point in the enumeration.

procedure: collection-fold-keys-right collection fold-function seed ... => seed ...
procedure: %-fold-keys-right % fold-function seed ... => seed ...

Behaves like collection-fold-keys-left, but enumerates in the reverse order over the keys or indices of an ordered map or sequence and values. This procedure is only defined for sequences and ordered maps.

Collections

procedure: collection? value => value

Returns a non-false value if the provided value is a collection.

procedure: collection-name collection => symbol (%)

Returns the collection name of the provided collection. The name is a symbol containing the type name of the specific collection. A collection whose constructor is make-list, for example, would have the symbol list returned from collection-name

procedure: %? value => value

Returns a non-false value iff the provided value is an instance of the specific collection type.

procedure: *-size * => exact integer

If the collection has a concept of size, this function returns the number of actual values in the collection (or mapped values in a map) If it does not, #f must be returned. When an integer is returned from this function, it indicates that an enumeration will proceed exactly *-size steps in the absence of any updates.

procedure: *-count * value=> exact integer

Counts the number of times value occurs in the collection, according to the collection's equivalence function.

procedure: *-get-any * [absence-thunk]=> value

Returns a value from the given collection. It is unspecified which of its values is returned. If the collection is empty, absence-thunk is invoked if present and its value returned, otherwise #f is returned.

procedure: *-empty? * => boolean

Returns a non-false value iff the given collection is known to be empty. This function should return false if it is known that there are values within the collection, or if it is unknown whether any values exist.

procedure: *->list * => list

Returns a newly allocated list containing the values of the collection. This can be done trivially with enumeration, but an implementation may choose to allow this function to behave more efficiently on certain collections. If the collection is ordered, the list must contain the values of the collection in the same order as the left collection fold.

procedure: *-clear * => %
procedure: *-clear! * => %

Clears the collection. In all cases a collection of the same type as the input is returned with no values or mappings. It is an error to clear an immutable collection and may be an error to clear a limited collection.

procedure: *= elt= * ... => boolean

Compares the provided (zero or more) collections for equivalence given the equivalence predicate elt=. If fewer than two collections are provided, a non-false value is returned. For any other number of collections, the collections are compared pairwise from left to right. As soon as any two collections contain different numbers of values or mappings, or their values aren't equivalent as in the section "Equivalence" above with the provided elt= function used for value comparison, false is returned. If all collections are equivalent, a non-false value is returned.

procedure: make-% => %

Constructs a % collection.

procedure: % value ... => %

Constructs a % collection with zero or more values provided as its initial contents.

procedure: *-copy * => %

Creates a new collection whose type and contents are the same as the collection passed as an operand, but which is distinct enough in storage that the new collection cannot be affected by modifications to the input collection and vice versa. This copy is shallow, that is, values are copied to the new collection in a way that preserves object identity.

Ordered Collections

procedure: ordered-collection? value => value

Returns a non-false value if the provided value is an ordered collection.

procedure: *-ordering-function * => procedure

Returns the ordering function of the provided ordered collection.

procedure: *-get-left * [absence-thunk]=> value

Returns the leftmost (least precedent) value in the ordered collection. If the collection is empty, absence-thunk is invoked if present and its value returned, otherwise #f is returned.

procedure: *-get-right * [absence-thunk]=> value

Returns the rightmost (most precedent) value in the ordered collection. If the collection is empty, absence-thunk is invoked if present and its value returned, otherwise #f is returned.

procedure: *-delete-left * => % value
procedure: *-delete-left! * => % value

Removes the leftmost (least precedent) value from the collection, returning two values, the updated collection and the value removed.

procedure: *-delete-right * => % value
procedure: *-delete-right! * => % value

Removes the rightmost (most precedent) value from the collection, returning two values, the updated collection and the value removed.

procedure: make-% ordering-function => %

Constructs a % ordered collection whose ordering is determined by the provided ordering function.

Limited Collections

procedure: limited-collection? value => value

Returns a non-false value if the provided value is an limited collection.

Purely Mutable Collections

procedure: purely-mutable-collection? value => value

Returns a non-false value if the provided value is a purely mutable collection.

Directional Collections

procedure: directional-collection? value => value

Returns a non-false value if the provided value is a directional collection.

procedure: *-get-left * [absence-thunk]=> value

Returns the leftmost value in the directional collection, as defined by the collections notion of left/right handedness. If the collection is empty, absence-thunk is invoked if present and its value returned, otherwise #f is returned.

procedure: *-get-right * [absence-thunk]=> value

Returns the rightmost value in the directional collection, as defined by the collections notion of left/right handedness. If the collection is empty, absence-thunk is invoked if present and its value returned, otherwise #f is returned.

procedure: *-insert-left * value=> %
procedure: *-insert-left! * value=> %
Adds a value to the leftmost position in the directional collection. A directional collection may not support this operation, but if it does, it must ensure that:
      (*-get-left (*-insert-left <*> <value>)) ;=> <value>
procedure: *-insert-right * value=> %
procedure: *-insert-right! * value=> %
Adds a value to the rightmost position in the directional collection. A directional collection may not support this operation, but if it does, it must ensure that:
      (*-get-right (*-insert-right <*> <value>)) ;=> <value>
procedure: *-delete-left * => % value
procedure: *-delete-left! * => % value
Removes the leftmost value from the directional collection, returning two values, the updated collection and the value removed. A directional collection may not support this operation, but if it does, it must preserve the following semantics:
      (let ((col (make-* values ...)))
        (call-with-values 
           (lambda ()
             (*-delete-left (*-insert-left * a-value)))
           (lambda (newcol val)
             (*= equal? col newcol)))) ; => #t
procedure: *-delete-right * => % value
procedure: *-delete-right! * => % value
Removes the rightmost value from the directional collection, returning two values, the updated collection and the value removed. A directional collection may not support this operation, but if it does, it must preserve the following semantics:
      (let ((col (make-* values ...)))
        (call-with-values 
           (lambda ()
             (*-delete-right (*-insert-right * a-value)))
           (lambda (newcol val)
             (*= equal? col newcol)))) ; => #t
Bags

procedure: bag? value => value

Returns a non-false value if the provided value is a bag.

procedure: *-equivalence-function * => procedure

Returns the equivalence function for the given bag.

procedure: *-contains? * value => boolean

Returns a non-false value if the bag contains any instances of the given value.

procedure: *-add * value => %
procedure: *-add! * value => %

Adds a single value to a bag.

procedure: *-delete * value => %
procedure: *-delete! * value => %

Removes a single instance of the given value from the bag.

procedure: *-delete-all * value => %
procedure: *-delete-all! * value => %

Removes any instances of the given value from the bag.

procedure: *-add-from * source-bag => %
procedure: *-add-from! * source-bag => %

Adds all the values in the source bag to the destination (first argument).

procedure: *-delete-from * source-bag => %
procedure: *-delete-from! * source-bag => %

Removes one instance of each value found in source-bag from the destination (first argument).

procedure: *-delete-all-from * source-bag => %
procedure: *-delete-all-from! * source-bag => %

Removes any instances of each value found in source-bag from the destination (first argument).

Sets

procedure: set? value => boolean

Returns a non-false value if the provided value is a set.

procedure: *-equivalence-function * => procedure

Returns the equivalence function for the given set.

procedure: *-contains? * value => boolean

Returns a non-false value if the set contains the given value.

procedure: *-subset? * sets ... => boolean

Returns a non-false value if the input set is a subset (that is, equal to or a strict subset of) the remaining sets. If only the input set is provided, a non-false value is returned, otherwise the set is tested to be a subset of each remaining set in left to right order. As soon as the set is not a subset of any one argument, #f is returned.

procedure: *-add * value => %
procedure: *-add! * value => %

Adds a value to the set.

procedure: *-delete * value => %
procedure: *-delete! * value => %

Removes the given value from the set.

procedure: *-union * sets ... => %
procedure: *-union! * sets ... => %

Performs set union of the input set (first argument) with zero or more other sets. The resulting set contains the values of all input sets. The sets are unioned in pairwise fashion from left to right.

procedure: *-intersection * sets ... => %
procedure: *-intersection! * sets ... => %

Performs set intersection of the input set (first argument) with zero or more other sets. The resulting set contains the values common to all input sets. The sets are intersected in pairwise fashion from left to right.

procedure: *-difference * sets ... => %
procedure: *-difference! * sets ... => %

Performs set difference between the first input set and the union of all subsequent sets. The resulting set contains the values of the input set which appear in no subsequent set. Conceptually, the sets are subtracted from left to right in pairwise fashion.

procedure: *-symmetric-difference * set => %
procedure: *-symmetric-difference! * set => %

Performs set symmetric difference (also known as xor) between the two input sets. The resulting set contains the values of the input sets which are not in both sets. This operator is equivalent to (*-difference (*-union * set) (*-insersection * set)).

procedure: *-add-from * bag => %
procedure: *-add-from! * bag => %

Adds all the values in the given bag to the input set.

procedure: *-delete-from * bag => %
procedure: *-delete-from! * bag => %

Removes all the values in the given bag from the input set.

Sequences

procedure: sequence? value => boolean

Returns a non-false value if the provided value is a sequence.

procedure: *-ref * integer [absence-thunk] => value

Returns the value stored in the input sequence at the exact integer index provided. If the index is outside the range of the sequence, absence-thunk is invoked if present and its value returned instead. It is an error if the index is out of range and no absence-thunk is provided.

procedure: *-get-left * [absence-thunk]=> value

Returns the value at index 0 in the collection. If the sequence is empty, absence-thunk is invoked if present and its value returned, otherwise #f is returned.

procedure: *-get-right * [absence-thunk]=> value

Returns the value at index (- (sequence-size *) 1) in the collection. If the sequence is empty, absence-thunk is invoked if present and its value returned, otherwise #f is returned.

procedure: *-insert-right * value=> %
procedure: *-insert-right! * value=> %

Adds a value to the end of the sequence.

procedure: *-set * integer value => %
procedure: *-set! * integer value => %

Replaces the value stored in the input sequence at the exact integer index provided with the given value. It is an error to reference an index outside the range of the sequence.

procedure: *-add * value => %
procedure: *-add! * value => %

Adds the given value to the end of the input sequence.

procedure: *-replace-from * dest-start source-sequence [source-start [source-end]] => %
procedure: *-replace-from! * dest-start source-sequence [source-start [source-end]] => %

Replaces several values in the input sequence with values from from the source sequence. dest-start is the exact integer first index in the input sequence which will be replaced by the copied values. source-start is the exact integer index which starts the subsequence of values which will be copied over the input sequence. If omitted, it defaults to 0. If provided, source-end specifies the first index above source-start which is not copied. If omitted, its value defaults to (sequence-size source-sequence). In either case, (- source-end source-start) values will be copied. It is an error if the number of values being copied to the dest-start and subsequent indices exceeds the size of the input sequence. In short, this function proceeds to replace the values of the input sequence from indices [dest-start, (dest-start + count)) with values from the source sequence from indices [source-start, source-end), where count is (- source-end source-start). This copy is shallow, that is, values are copied to the input sequence in a way that preserves object identity.

procedure: *-copy * [start [end]] => %

Creates a new sequence whose type and contents are the same as the input sequence passed as an operand, but which is distinct enough in storage that the new sequence cannot be affected by modifications to the input sequence and vice versa. start and end are optional exact integer arguments. If start is provided the new sequence will contain the values of the input sequence starting from that offset and continuing to the end of the input sequence. If start and end are provided, the new sequence will contain the values from indices [start, end). In either case the selected values appear in the new sequence from index 0.
This copy is shallow, that is, values are copied to the new sequence in a way that preserves object identity.

Flexible Sequences

procedure: flexible-sequence? value => boolean

Returns a non-false value if the provided value is a flexible sequence.

procedure: *-insert * index value => %
procedure: *-insert! * index value => %

Inserts the provided value at the given index in the flexible sequence. If index is not equal to the collection-size of the sequence, this will result in the current value at index and subsequent values' indices to increase by one.

procedure: *-delete-at * index => %
procedure: *-delete-at! * index => %

Deletes the current value at the given index in the flexible sequence. If index is not equal to the element at index (- (collection-size *) 1) of the sequence, this will result in the subsequent values' indices to decrease by one, filling the newly created gap.

procedure: *-insert-left * value=> %
procedure: *-insert-left! * value=> %

Inserts a value into the flexible sequence at position 0.

procedure: *-insert-right * value=> %
procedure: *-insert-right! * value=> %

Inserts a value into the flexible sequence at position (collection-size seq).

procedure: *-delete-left * => % value
procedure: *-delete-left! * => % value

Removes the value at position 0 in the flexible sequence, returning two values, the updated collection and the value removed.

procedure: *-delete-right * => % value
procedure: *-delete-right! * => % value

Removes the value at position (- (flexible-sequence-size *) 1), returning two values, the updated collection and the value removed.

Maps

procedure: map? value => boolean

Returns a non-false value if the provided value is a map.

procedure: % pair ... => %

Constructs a % map with zero or more bindings provided as its initial values. Each operand (after an equivalence and/or ordering function if present) to a map constructor must be a Scheme pair whose car is the new key and whose cdr is the value to which the key will map. If two mappings are provided with the same key, it is undefined which will remain in the map.

procedure: *-equivalence-function * => procedure

Returns the value equivalence function for the input map.

procedure: *-key-equivalence-function * => procedure

Returns the key equivalence function for the input map.

procedure: *-contains-key? * key => boolean

Returns a non-false value if the input map has an entry for the given key, or false if it does not.

procedure: *-keys->list * => list

Returns a newly allocated list containing the keys of the map. This can be done trivially with enumeration, but this procedure is provided if an implementation can more efficiently perform this operation directly. If the map is ordered, the list must contain the keys of the collection in the same order as a left key enumeration.

procedure: *-get * key [absence-thunk] => value

Retrieves the value mapped by the given key. If no such mapping exists, absence-thunk is invoked if present and its value returned, otherwise #f is returned.

procedure: *-put * key value [absence-thunk]=> % value
procedure: *-put! * key value [absence-thunk]=> % value

Replaces the previous mapping from the given key with a mapping from the same key to the provided value. The updated map is returned as the first value. If no previous mapping existed for the given key and absence-thunk provided, it is invoked and its value is returned as the second return value. If not provided, the second value is #f.
If a previous mapping existed, the value of the replaced mapping is returned as the second value.

procedure: *-update * key func [absence-thunk] => %
procedure: *-update! * key func [absence-thunk] => %

Updates the value of the mapping from the provided key key to the results of applying the single argument function func to the previous value. If no previous mapping existed, absence-thunk is invoked if present and its value passed to func otherwise #f is passed.

procedure: *-delete * key => %
procedure: *-delete! * key => %

Removes an existing mapping for the given key. If no previous mapping existed, this call does nothing.

procedure: *-delete-from * bag => %
procedure: *-delete-from! * bag => %

Removes all mappings from keys found in the given bag, returning the updated map.

procedure: *-add-from * source-map => %
procedure: *-add-from! * source-map => %

Adds all the mappings in the source map to the input map. If a mapping exists in the source map and the input map, the mapping from the source will replace that of the destination.
Scheme Collections

This SRFI additionally specifies a Collections API based around the Scheme list, the association list, the vector, and string types. These types are compatible with those from R5RS. Where procedures from this SRFI intersect with those from R5RS, the SRFI-44 procedures are extensions of their standard counterparts. It is intended that the collection types provided here be usable both by the collections operators and other standard Scheme programs.

Lists

Scheme List collections are flexible sequences. In the reference implementation provided by this SRFI, lists have an unstable enumeration.

procedure: make-list [size [default]] => list

Creates a new list. If size is provided, it will start out with size instances of the value given as default. If default is not provided, the contents of the list are undefined.
procedure: list value ... => list

procedure: list-fold-left list fold-function seeds ... => seeds ...
procedure: list-fold-right list fold-function seeds ... => seeds ...

procedure: list-equivalence-function list => procedure

procedure: list-copy list [start [end]] => list
procedure: list->list list => list

procedure: list? value => boolean
procedure: list-size list => exact integer
procedure: list-empty? list => boolean

procedure: list-contains? list value => boolean

procedure: list-ref list integer [absence-thunk] => value
procedure: list-get-any list [absence-thunk] => value
procedure: list-get-left list [absence-thunk] => value
procedure: list-get-right list [absence-thunk] => value

procedure: list-count list value => exact integer

procedure: list= elt= lists ... => boolean

procedure: list-add list value => list
procedure: list-add! list value => list

procedure: list-set list integer value => list
procedure: list-set! list integer value => list

procedure: list-insert-left list value => list
procedure: list-insert-left! list value => list
procedure: list-insert-right list value => list
procedure: list-insert-right! list value => list

procedure: list-delete list value => list
procedure: list-delete! list value => list

procedure: list-delete-left list => list value
procedure: list-delete-left! list => list value
procedure: list-delete-right list => list value
procedure: list-delete-right! list => list value

procedure: list-delete-all list value => list
procedure: list-delete-all! list value => list

procedure: list-add-from list bag => list
procedure: list-add-from! list bag => list

procedure: list-delete-from list bag => list
procedure: list-delete-from! list bag => list

procedure: list-delete-all-from list bag => list
procedure: list-delete-all-from! list bag => list

procedure: list-replace-from dest-start source-list [source-start [source-end]] => list
procedure: list-replace-from! dest-start source-list [source-start [source-end]] => list

procedure: list-clear list => list
procedure: list-clear! list => list

procedure: list-insert list index value => list
procedure: list-insert! list index value => list

procedure: list-delete-at list index => list
procedure: list-delete-at! list index => list

Association List Maps
Scheme association lists implement a superset of the Map functionality (they may store more than one key/value pair with the same key). The operations below operate on alists as maps, enforcing a single instance of each key. Because a future SRFI may specify M:N mappings (dictionaries), we call the alist version of a SRFI-44 map an "alist-map". In the reference implementation provided by this SRFI, the alist-map has an unstable enumeration.

procedure: make-alist-map [equivalence-function] => alist-map
procedure: alist-map [equivalence-function] (key . value) ... => alist-map

procedure: alist-map-fold-left alist-map fold-function seeds ... => seeds ...
procedure: alist-map-fold-right alist-map fold-function seeds ... => seeds ...
procedure: alist-map-fold-keys-left alist-map fold-function seeds ... => seeds ...
procedure: alist-map-fold-keys-right alist-map fold-function seeds ... => seeds ...

procedure: alist-map-equivalence-function alist-map => procedure
procedure: alist-map-key-equivalence-function alist-map => procedure

procedure: alist-map-count alist-map value => exact integer
procedure: alist-map-key-count alist-map value => exact integer
procedure: alist-map-contains-key? alist-map value => boolean
procedure: alist-map-size alist-map => exact integer

procedure: alist-map-copy alist-map => alist-map
procedure: alist-map->list alist-map => list
procedure: alist-map-keys->list alist-map => list

procedure: alist-map? value => boolean

procedure: alist-map= elt= alist-maps ... => boolean

procedure: alist-map-get alist-map key => value
procedure: alist-map-get-all alist-map key => list

procedure: alist-map-put alist-map key value => alist-map value
procedure: alist-map-put! alist-map key value => alist-map value
procedure: alist-map-replace-all alist-map key value ... => alist-map
procedure: alist-map-replace-all! alist-map key value ... => alist-map

procedure: alist-map-update alist-map key func [absence-thunk]=> alist-map
procedure: alist-map-update! alist-map key func [absence-thunk]=> alist-map
procedure: alist-map-update-all alist-map key func [absence-thunk]=> alist-map
procedure: alist-map-update-all! alist-map key func [absence-thunk]=> alist-map

procedure: alist-map-delete alist-map key => alist-map
procedure: alist-map-delete! alist-map key => alist-map

procedure: alist-map-delete-all alist-map key => alist-map
procedure: alist-map-delete-all! alist-map key => alist-map

procedure: alist-map-delete-from alist-map bag => alist-map
procedure: alist-map-delete-from! alist-map bag => alist-map

procedure: alist-map-delete-all-from alist-map bag => alist-map
procedure: alist-map-delete-all-from! alist-map bag => alist-map

procedure: alist-map-clear alist-map => alist-map
procedure: alist-map-clear! alist-map => alist-map

procedure: alist-map-add-from alist-map dict => alist-map
procedure: alist-map-add-from! alist-map dict => alist-map

Vectors

Scheme Vectors are purely mutable limited sequences. In the reference implementation provided by this SRFI, vectors have an unstable enumeration. The functional update procedures of vectors likely require a vector clone, and would therefore impose an efficiency penalty over the side-effecting update procedures.

procedure: make-vector size [default] => vector

Creates a new vector with the provided size. If default is not provided, the contents of the vector are undefined.
procedure: vector value ... => vector

procedure: vector-fold-left vector fold-function seeds ... => seeds ...
procedure: vector-fold-right vector fold-function seeds ... => seeds ...

procedure: vector-equivalence-function vector => procedure

procedure: vector-copy vector [start [end]] => vector
procedure: vector->list vector => list

procedure: vector? value => boolean
procedure: vector-size vector => exact integer
procedure: vector-empty? vector => boolean

procedure: vector-contains? vector value => boolean

procedure: vector-count vector value => exact integer

procedure: vector-ref vector index [absence-thunk] => value
procedure: vector-get-any vector [absence-thunk] => value
procedure: vector-get-left vector [absence-thunk] => value
procedure: vector-get-right vector [absence-thunk] => value

procedure: vector-set vector index value => vector
procedure: vector-set! vector index value => vector

procedure: vector-replace-from dest-start source-vector [source-start [source-end]] => vector
procedure: vector-replace-from! dest-start source-vector [source-start [source-end]] => vector

procedure: vector= elt= vectors ... => boolean

Strings

Scheme strings are purely mutable limited homogeneous sequences which can only store characters. In the reference implementation provided by this SRFI, strings have an unstable enumeration. The functional update procedures of strings likely require a vector clone, and would therefor impose an efficiency penalty over the side-effecting update procedures.

procedure: make-string size [default-character] => string
procedure: string character ... => string

procedure: string-fold-left string fold-function seeds ... => seeds ...
procedure: string-fold-right string fold-function seeds ... => seeds ...

procedure: string-equivalence-function string => procedure

procedure: string-copy string [start [end]]=> string
procedure: string->list string => list

procedure: string-size string => exact integer
procedure: string-empty? string => boolean

procedure: string? value => boolean

procedure: string-contains? string character => boolean

procedure: string-count string character => exact integer

procedure: string-ref string index [absence-thunk] => value
procedure: string-get-any string [absence-thunk] => value
procedure: string-get-left string [absence-thunk] => value
procedure: string-get-right string [absence-thunk] => value

procedure: string-set string index character => string
procedure: string-set! string index character => string

procedure: string-replace-from dest-start source-string [source-start [source-end]] => string
procedure: string-replace-from! dest-start source-string [source-start [source-end]] => string

procedure: string= elt= strings ... => boolean

Compatibility

Some of the names defined in this SRFI uses the same or similar name in existing SRFIs or implementations. Some of them are defined so that they are extension of the existing practice, and some of them have incompatibilities.

Extensions to R5RS

R5RS doesn't specify the return value of string-set! and vector-set!. This SRFI specifies them to return the passed collection. string-ref, vector-ref and list-ref are extended to take an optional argument, the absence-thunk.

Relation to other SRFIs

string= has different interface from string= in SRFI-13, which has the following API:
     (string= string1 string2 [start1 end1 start2 end2])
If the implementation wishes to support both SRFIs, it may dispatch by its argument types. Note that SRFI-13's string=, while having the naming convention also used by SRFIs 1 and 43, has a different purpose: it is rather an extension of R5RS's string=? (note the question mark) with a modified name.

string-copy is a superset, while string-count is a subset of the same operators from SRFI-13.

*-count from this SRFI differs from count in SRFI-1. This SRFI specifies no SRFI-1 compatible count as it could not be significantly more efficient than a fold.

string-contains? has a similar name to string-contains from SRFI-13, but the latter looks for substring match, instead of element-wise match.

Both arguments of make-list in this SRFI are optional. Only the last argument in SRFI-1 is optional.

Relation to Scheme implementations

Some implementations (e.g. PLT) use absence-thunk to give a default value in ?-ref or ?-get API. Other implementations (e.g. Chicken, STk, Gauche) instead allow an optional value to specify a default value. This SRFI takes the thunk approach due to its additional flexiblility not only to produce a default value, but to do so through an arbitrary computation. There are various other extensions and incompatibilities sprinkled among implementations. For example, Gauche has extended versions of string-ref, vector-ref and list-ref, and has a string-size operator which returns the number of bytes a multi-byte string occupies.

Implementation

This SRFI behaves more as a meta-SRFI for future SRFIs to define concrete collection types. However, a reference implementation for the collections wrapper around the common Scheme types is provided in the archive srfi-44.tgz. It requires SRFI's 1 (list-lib), 23 (error), and the Tiny-CLOS portable object system to provide type dispatch.

Future Work

This SRFI does not specify any concrete collections beyond those which already exist in Scheme. Obvious future work involves specifying a useful set of collections for modern Scheme programming.

Additionally, sorting and searching are deliberately left out of this SRFI to avoid complexity. Sorting and searching would be useful for a great number of applications. Resurrecting SRFI-32 and reforming it in the context of this SRFI may be a useful endeavour.

Earlier drafts of this SRFI included not only the M:1 Map collection, but an M:N Dictionary collection which could map a key to numerous values. This collection was removed because it was not clear that we knew what actually constituted an optimal interface for such a collection. The text of the specification at the time of its removal is included in this document, but commented out. It may make a good starting point for a Dictionary SRFI to build on this specification.

Because this SRFI leaves the dispatch strategy open to implementers, it is impossible for a collection implementer to add a new collection to a system supporting this SRFI without knowledge of the mechanisms used. It may be worthwile to standardize an interface for adding collections to a SRFI-44 compliant system that is independent of the dispatch system.

Finally, some interest was shown in several extensions which allow for various other programming paradigms and in extensions which improve the programmer's ability to write very efficient code while remaining agnostic of collection type, at the expense of some simplicity. Rather than support all of these requests in this SRFI with limited study, it was decided to keep the SRFI simple and non-conflicting with these ideas. Future work may take up these issues and devote the proper time which they deserve.

References

Much of this design is influenced by other languages collection interfaces. Below are links to other languages' collections interfaces, many of which contributed to design decisions.

The generic enumerations protocol was championed by Oleg Kiselyov and his work on enumeration:

General ways to Traverse Collections [Feb 2003]

Copyright

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published, and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Scheme Request For Implementation process or editors, except as needed for the purpose of developing SRFIs in which case the procedures for copyrights defined in the SRFI process must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by the authors or their successors or assigns.

This document and the information contained herein are provided on an "AS IS" basis and THE AUTHOR AND THE SRFI EDITORS DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNISS FOR A PARTICULAR PURPOSE.

Author: Scott G. Miller

Editor: Francisco Solsona

Last modified: Sun Jan 28 13:40:19 MET 2007

Title

Author

Status

Abstract

Table of Contents

Collection

Bag

Set

Sequence

Flexible Sequence

Map

Copyright