250: Insertion-ordered hash tables

by John Cowan (shepherd, text), Will Clinger (text), Daphne Preston-Kendal (implementation)

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-250@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This SRFI defines an interface to hash tables, which are widely recognized as a fundamental data structure for a wide variety of applications. A hash table is a data structure that:

Unlike the hash tables of SRFI 125, which is the direct ancestor of this specification, the hash tables described here are ordered by insertion: that is, associations inserted earlier in the history of the hash table appear earlier in the ordering. Advances in the implementations of hash tables, as provided by C++, Python, JavaScript, etc., make the provision of this new facility practical. As a result, the hash tables of this SRFI do not interoperate with the hash tables of SRFI 125, SRFI 126, or existing R6RS implementations.

Issues

Rationale

Hash tables themselves don't really need defending: almost all dynamically typed languages, from awk to JavaScript to Lua to Perl to Python to Common Lisp, and including many Scheme implementations, provide them in some form as a fundamental data structure. Therefore, what needs to be defended is not the data structure but the procedures. This SRFI supports a great many convenience procedures on top of the basic hash table interfaces provided by SRFI 69 and R6RS. Modulo the question of association ordering, nothing in it adds power to what those interfaces provide, but it does add convenience in the form of pre-debugged routines to do various common things, and even some things not so commonly done but useful.

There is no support for thread safety or weakness.

This specification depends on SRFI 128 comparators, which package a type test, an equality predicate, and a hash function into a single bundle.

The relatively few hash table procedures in R6RS are all available in this SRFI under somewhat different names. This SRFI adopts SRFI 69's spelling hash-table rather than R6RS's hashtable, because of the universal use of "hash table" rather than "hashtable" in other computer languages and in technical prose generally. Besides, the English word hashtable obviously means something that can be ... hashted. It would be trivial to provide the R6RS names on top of this SRFI.

Common Lisp compatibility

As usual, the Common Lisp names are completely different from the Scheme names. Common Lisp provides the following capabilities that are not in this SRFI:

Sources

The procedures in this SRFI are drawn primarily from SRFI 69 and R6RS. In addition, the following sources are acknowledged:

The procedures hash-table-empty?, hash-table-empty-copy, hash-table-pop!, hash-table-map!, hash-table-intersection!, hash-table-difference!, and hash-table-xor! were added for convenience and completeness.

The native hash tables of MIT, SISC, Bigloo, Scheme48, SLIB, RScheme, Scheme 7, Scheme 9, Rep, and FemtoLisp were also investigated, but no additional procedures were incorporated.

Pronunciation

The slash in the names of some procedures can be pronounced "with".

Specification

The procedures in this SRFI are in the (srfi 250) library (or (srfi :250) on R6RS).

All references to "executing in expected amortized constant time" presuppose that a satisfactory hash function is available. Arbitrary or impure hash functions can make a hash of any implementation.

Hash tables are allowed to cache the results of calling the equality predicate and hash function, so programs cannot rely on the hash function being called exactly once for every primitive hash table operation: it may be called zero, one, or more times.

It is an error if the procedure argument of hash-table-find, hash-table-count, hash-table-map, hash-table-for-each, hash-table-map!, hash-table-map->list, hash-table-fold or hash-table-prune! mutates the hash table being walked.

It is an error to pass two hash tables that have different (in the sense of eq?) comparators to any of the procedures of this SRFI.

Implementations are permitted to ignore user-specified hash functions in certain circumstances. Specifically, if the equality predicate, whether passed as part of a comparator or explicitly, is more fine-grained (in the sense of R7RS-small section 6.1) than equal?, the implementation is free — indeed, is encouraged — to ignore the user-specified hash function and use something implementation-dependent. This allows the use of addresses as hashes, in which case the keys must be rehashed if they are moved by the garbage collector. Such a hash function is unsafe to use outside the context of implementation-provided hash tables. It can of course be exposed by an implementation as an extension, with suitable warnings against inappropriate uses.

It is an error to mutate a key during or after its insertion into a hash table in such a way that the hash function of the table will return a different result when applied to that key.

Index

Constructors

Note that the argument k is a positive integer representing the initial capacity of the hashtable being created (that is, the number of associations it can hold without having to grow). If not present, the initial capacity is implementation-dependent.

(make-hash-table comparator [ k ])

Returns a newly allocated hash table whose equality predicate and hash function are extracted from comparator.

As mentioned above, implementations are free to use an appropriate implementation-dependent hash function instead of the specified hash function, provided that the specified equality predicate is a refinement of the equal? predicate. This applies whether the hash function and equality predicate are passed as separate arguments or packaged up into a comparator.

The constraints on equality predicates and hash functions are given in SRFI 128.

(R6RS make-eq-hashtable, make-eqv-hashtable, and make-hashtable; Common Lisp make-hash-table)

(hash-table comparator [ key value ] ...)

Returns a newly allocated hash table, created as if by make-hash-table using comparator. For each pair of arguments, an association is added to the new hash table with key as its key and value as its value. This procedure returns an immutable hash table. If the same key (in the sense of the equality predicate) is specified more than once, it is an error.

(hash-table-unfold stop? mapper successor seed comparator [ k ])

Create a new hash table as if by make-hash-table using comparator and the args. If the result of applying the predicate stop? to seed is true, return the hash table. Otherwise, apply the procedure mapper to seed. Mapper returns two values, which are inserted into the hash table as the key and the value respectively. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm.

(alist->hash-table alist comparator [ k ])

Returns a newly allocated hash-table as if by make-hash-table using comparator and the args. It is then initialized from the associations of alist. Associations earlier in the list take precedence over those that come later.

Predicates

(hash-table? obj)

Returns #t if obj is a hash table, and #f otherwise. (R6RS hashtable?; Common Lisp hash-table-p)

(hash-table-contains? hash-table key)

Returns #t if there is any association to key in hash-table, and #f otherwise. Must execute in expected amortized constant time. (R6RS hashtable-contains?)

(hash-table-empty? hash-table)

Returns #t if hash-table contains no associations, and #f otherwise.

(hash-table=? value-comparator hash-table1 hash-table2)

Returns #t if hash-table1 and hash-table2 have the same keys (in the sense of their common equality predicate) and each key has the same value (in the sense of value-comparator), and #f otherwise.

(hash-table-mutable? hash-table)

Returns #t if the hash table is mutable. (R6RS hashtable-mutable?)

Accessors

The following procedures, given a key, return the corresponding value.

(hash-table-ref hash-table key [ failure [ success ] ])

Extracts the value associated to key in hash-table, invokes the procedure success on it, and returns its result; if success is not provided, then the value itself is returned. If key is not contained in hash-table and failure is supplied, then failure is invoked on no arguments and its result is returned. Otherwise, it is an error. Must execute in expected amortized constant time, not counting the time to call the procedures.

(hash-table-ref/default hash-table key default)

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-ref hash-table key (lambda () default))

(R6RS hashtable-ref; Common Lisp gethash)

(hash-table-comparator hash-table)

Returns a comparator equivalent to the comparator with which hash-table was created. R6RS hashtable-equivalence-function and hashtable-hash-function.

Mutators

The following procedures alter the associations in a hash table either unconditionally or conditionally on the presence or absence of a specified key. It is an error to add an association to a hash table whose key does not satisfy the type test predicate of the comparator used to create the hash table.

(hash-table-set! hash-table arg ...)

Repeatedly mutates hash-table, creating new associations in it by processing the arguments from left to right. The args alternate between keys and values. However. if there is a previous association for a key, its value is updated. It is an error if the type check procedure of the comparator of hash-table, when invoked on a key, does not return #t. Likewise, it is an error if a key is not a valid argument to the equality predicate of hash-table. Returns an unspecified value. Must execute in expected amortized constant time per key. R6RS hashtable-set! and Common Lisp (setf gethash) do not handle multiple associations.

(hash-table-delete! hash-table key ...)

Deletes any association to each key in hash-table and returns the number of keys that had associations. Must execute in expected amortized constant time per key. R6RS hashtable-delete! and Common Lisp remhash do not handle multiple associations.

(hash-table-intern! hash-table key failure)

Effectively invokes hash-table-ref with the given arguments and returns what it returns. If key was not found in hash-table, its value is set to the result of calling failure. Must execute in expected amortized constant time.

(hash-table-update! hash-table key updater [ failure [ success ] ])

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-set! hash-table key (updater (hash-table-ref hash-table key failure success)))

Must execute in expected amortized constant time. Returns an unspecified value.

(hash-table-update!/default hash-table key updater default)

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-set! hash-table key (updater (hash-table-ref/default hash-table key default)))

(R6RS hashtable-update!)

Must execute in expected amortized constant time. Returns an unspecified value.

(hash-table-pop! hash-table)

Chooses the first association from hash-table and removes it, returning the key and value as two values.

It is an error if hash-table is empty.

(hash-table-clear! hash-table)

Delete all the associations from hash-table. (R6RS hashtable-clear!; Common Lisp clrhash)

The whole hash table

These procedures process the associations of the hash table in insertion order.

(hash-table-size hash-table)

Returns the number of associations in hash-table as an exact integer. Must execute in constant time. (R6RS hashtable-size; Common Lisp hash-table-count.)

(hash-table-keys hash-table)
(hash-table-key-vector hash-table)

Returns a newly allocated list/vector of all the keys in hash-table. R6RS hashtable-keys returns a vector.

(hash-table-values hash-table)
(hash-table-value-vector hash-table) R6RS hashtable-values returns a vector.

Returns a newly allocated list/vector of all the keys in hash-table.

(hash-table-entries hash-table)
(hash-table-entry-vectors hash-table)

Returns two values, a newly allocated list/vector of all the keys in hash-table and a newly allocated list/vector of all the values in hash-table in the corresponding order. R6RS hash-table-entries returns vectors.

(hash-table-find proc hash-table failure)

For each association of hash-table, invoke proc on its key and value. If proc returns true, then hash-table-find returns what proc returns. If all the calls to proc return #f, return the result of invoking the thunk failure.

(hash-table-count pred hash-table)

For each association of hash-table, invoke pred on its key and value. Return the number of calls to pred which returned true.

Mapping and folding

These procedures process the associations of the hash table in insertion order.

(hash-table-map proc comparator hash-table)

Returns a newly allocated hash table as if by (make-hash-table comparator). Calls proc for every association in hash-table with the value of the association. The key of the association and the result of invoking proc are entered into the new hash table. Note that this is not the result of lifting mapping over the domain of hash tables, but it is considered more useful.

If comparator recognizes multiple keys in the hash-table as equivalent, any one of such associations is taken.

(hash-table-for-each proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is discarded. Returns an unspecified value.

(hash-table-map! proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is used to update the value of the association. Returns an unspecified value.

(hash-table-map->list proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The values returned by the invocations of proc are accumulated into a list, which is returned.

(hash-table-fold proc seed hash-table)

Calls proc for every association in hash-table with three arguments: the key of the association, the value of the association, and an accumulated value val. Val is seed for the first invocation of procedure, and for subsequent invocations of proc, the returned value of the previous invocation. The value returned by hash-table-fold is the return value of the last invocation of proc.

(hash-table-prune! proc hash-table)

Calls proc for every association in hash-table with two arguments, the key and the value of the association, and removes all associations from hash-table for which proc returns true. Returns the number of associations that were removed.

Copying and conversion

(hash-table-copy hash-table [ mutable? ])

Returns a newly allocated hash table with the same properties and associations as hash-table. If the second argument is present and is true, the new hash table is mutable. Otherwise it is immutable provided that the implementation supports immutable hash tables. (R6RS hashtable-copy)

(hash-table-empty-copy hash-table)

Returns a newly allocated mutable hash table with the same properties as hash-table, but with no associations.

(hash-table->alist hash-table)

Returns an alist with the same associations as hash-table in insertion order.

Hash tables as sets

(hash-table-union! hash-table1 hash-table2)

Adds the associations of hash-table2 to hash-table1 and returns hash-table1. If a key appears in both hash tables, its value is set to the value appearing in hash-table1. Returns hash-table1.

(hash-table-intersection! hash-table1 hash-table2)

Deletes the associations from hash-table1 whose keys don't also appear in hash-table2 and returns hash-table1.

(hash-table-difference! hash-table1 hash-table2)

Deletes the associations of hash-table1 whose keys are also present in hash-table2 and returns hash-table1.

(hash-table-xor! hash-table1 hash-table2)

Deletes the associations of hash-table1 whose keys are also present in hash-table2, and then adds the associations of hash-table2 whose keys are not present in hash-table1 to hash-table1. Returns hash-table1.

Implementation

The current sample implementation is unfinished, and is here. When it is finished, it will be moved into this SRFI's repo. It relies upon SRFI 128.

Acknowledgements

Some of the language of this SRFI is copied from SRFI 69 with thanks to its author, Panu Kalliokoski. However, he is not responsible for what I have done with it.

I also acknowledge the members of the SRFI 250, 125, 126, and 128 mailing lists, especially Takashi Kato, Alex Shinn, Shiro Kawai, and Per Bothner.

© 2023 John Cowan, Will Clinger, Daphne Preston-Kendal.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler