250: Insertion-ordered hash tables

by John Cowan and Daphne Preston-Kendal, based on prior SRFIs by Will Clinger and Panu Kalliokoski

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-250@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This SRFI defines an interface to hash tables, which are widely recognized as a fundamental data structure for a wide variety of applications. A hash table is a data structure that:

Unlike the hash tables of SRFI 125, which is the direct ancestor of this specification, the hash tables described here are ordered by insertion: that is, associations inserted earlier in the history of the hash table appear earlier in the ordering. Advances in the implementations of hash tables, as provided by C++, Python, JavaScript, etc., make the provision of this new facility practical. As a result, the hash tables of this SRFI do not necessarily interoperate with the hash tables of SRFI 125, SRFI 126, or existing R6RS implementations.

Issues

None at present.

Rationale

Hash tables themselves don’t really need defending: almost all dynamically typed languages, from awk to JavaScript to Lua to Perl to Python to Common Lisp, and including many Scheme implementations, provide them in some form as a fundamental data structure. Therefore, what needs to be defended is not the data structure but the procedures. This SRFI supports a great many convenience procedures on top of the basic hash table interfaces provided by SRFI 69 and R6RS. Modulo the question of association ordering, nothing in it adds power to what those interfaces provide, but it does add convenience in the form of pre-debugged routines to do various common things, and even some things not so commonly done but useful.

There is no support for thread safety or weakness.

This specification depends on SRFI 128 comparators, which package a type test, an equality predicate, and a hash function into a single bundle.

The relatively few hash table procedures in R6RS are all available in this SRFI under somewhat different names. This SRFI adopts SRFI 69’s spelling hash-table rather than R6RS’s hashtable, because of the universal use of ‘hash table’ rather than ‘hashtable’ in other computer languages and in technical prose generally.* Besides, the English word hashtable obviously means something that can be ... hashted. It would be trivial to provide the R6RS names on top of this SRFI.

Common Lisp compatibility

As usual, the Common Lisp names are completely different from the Scheme names. Common Lisp provides the following capabilities that are not in this SRFI:

Sources

The procedures in this SRFI are drawn primarily from SRFI 69 and R6RS. In addition, the following sources are acknowledged:

The procedures hash-table-empty?, hash-table-empty-copy, hash-table-pop!, hash-table-map!, hash-table-intersection!, hash-table-difference!, and hash-table-xor! were added for convenience and completeness.

The native hash tables of MIT, SISC, Bigloo, Scheme48, SLIB, RScheme, Scheme 7, Scheme 9, Rep, and FemtoLisp were also investigated, but no additional procedures were incorporated.

The hash-table-keys, hash-table-values, hash-table-entries (from SRFI 69) and the corresponding vector-based versions (used in R6RS) have been removed from this SRFI in favour of the cursor-based iteration interface.

Pronunciation

The slash in the names of some procedures can be pronounced ‘with’.

Editorial conventions

This SRFI uses a convention from the Racket documentation, which extends the usual Scheme specification use of ellipsis (‘...’) in procedure entry headers.

Namely, when two specified formals are followed by a pair of ellipses, it means there must be any even number of arguments in sequence (including zero arguments). The zeroth, second, fourth, etc. actual arguments are treated as the values of the first specified formal, and the first, third, fifth, etc. actual arguments as the corresponding values of the second of the specified formals.

The following formal names used in the specification of procedures imply the type of their actual arguments.

hash-table

A hash table as defined by this SRFI. It is an assertion violation if the argument is not a hash table object originally returned from one of the procedures in this SRFI, or by an implementation-specified set of additional constructor procedures. In particular, it is implementation-specified whether the hash table types defined by R6RS, SRFI 69, SRFI 125, or SRFI 126 will be interoperable with the procedures in this SRFI. If they are, the hash tables provided in those libraries must also be insertion-ordered, although this SRFI makes no guarantees about how any of the procedures of those libraries will affect the insertion order.

k

An exact nonnegative integer representing the initial capacity of a hash table being created (that is, the number of associations it can hold without having to grow). If not present, the initial capacity of any hash table is unspecified. An implementation may be significantly more efficient in time and/or memory if it is given a correct value of k when a hash table is created (meaning a value which actually corresponds to the maximum size of the hash table throughout its existence).

comparator

A SRFI 128 comparator with a hash function, which will be used to provide a hash function, equality predicate, and potentially type test for a hash table being created. It is an assertion violation to pass an object which is not a comparator, or a comparator without a hash function, as the value of a comparator argument.

key

An object representing the key of an association or potential association within a hash table. Implementations should signal an assertion violation if any value of key does not satisfy the type test of the hash table’s comparator, but this is not required.

In particular, it is unspecified whether this is an assertion violation:

(let ((ht (make-hash-table
           (make-comparator integer? = #f number-hash))))
  (hash-table-set! ht 1/2 "one half"))

Because while the comparator specifies integer? as its type test, the equality predicate = and hash function number-hash will work on any number object, including the non-integer 1/2. It is expected that the majority of comparators’ equality predicates and hash functions will themselves signal assertion violations if their input is the wrong type.

In cases where multiple key arguments are provided to procedures which mutate the hash table, it is unspecified whether, if some but not all of the given keys are not the right type for the hash table’s comparator, any of the mutations specified for the other keys will have taken place before an assertion violation is raised or before the hash function or equality predicate is called on the problematic key. However, such cases must not leave the hash table object in an internally inconsistent state, and the hash table object following such a case must be in a state which represents the successful completion of some number of mutations (potentially zero) that were correctly specified by the call to the mutating hash table procedure.

Unless otherwise specified, in cases where multiple key arguments are provided to procedures which mutate a hash table or create a new one, it is unspecified whether providing the same key (in the sense of the hash table’s comparator’s hash function and equality predicate) multiple times will be an assertion violation, or if an arbitrary one of the provided keys (and potential associated values) will be used. In the latter case, if a new association is created for the given keys, it is also unspecified if the association’s position in the insertion order will represent the same position as in the sequence of arguments for the value which is actually chosen, or another position.

value

An object representing the value of an association or potential association within a hash table. The hash table implementation imposes no requirements on the type of such an object.

cursor

A hash table cursor as defined by the section ‘Low-level iteration’. As specified in that section, incorrect use of a hash table cursor, including passing a cursor of the wrong type, is undefined behaviour.

Specification

The procedures in this SRFI are in the (srfi 250) library (or (srfi :250 hash-tables) on R6RS).

Hash tables may be mutable or immutable. All hash tables created by the procedures in this SRFI are mutable by default, except hash tables created by the hash-table-copy procedure with the mutable? argument set of #f. It is an assertion violation to attempt to mutate an immutable hash table, whether by adding new associations, deleting associations, or changing the value of any association.

All references to ‘executing in expected amortized constant time’ presuppose that a satisfactory hash function is available. Arbitrary or impure hash functions can make a hash of any implementation.

Hash tables are allowed to cache the results of calling the equality predicate and hash function, so programs cannot rely on the hash function being called exactly once for every primitive hash table operation: it may be called zero, one, or more times.

It is undefined behaviour if the procedure argument of hash-table-find, hash-table-count, hash-table-map, hash-table-for-each, hash-table-map!, hash-table-map->list, hash-table-fold, hash-table-fold-left, hash-table-fold-right, or hash-table-prune! mutates the hash table being walked.

Implementations are permitted to ignore user-specified hash functions in certain circumstances. Specifically, if the equality predicate, whether passed as part of a comparator or explicitly, is more fine-grained (in the sense of R7RS-small section 6.1) than equal?, the implementation is free — indeed, is encouraged — to ignore the user-specified hash function and use something implementation-dependent. This allows the use of addresses as hashes, in which case the keys must be rehashed if they are moved by the garbage collector. Such a hash function is unsafe to use outside the context of implementation-provided hash tables. It can of course be exposed by an implementation as an extension, with suitable warnings against inappropriate uses.

It is undefined behaviour to mutate a key during or after its insertion into a hash table in such a way that the hash function of the table will return a different result when applied to that key.

Unless otherwise specified, procedures whose names end in ! return unspecified values.

Index

Constructors

Note: The examples in the subsequent sections of this SRFI assume that the example variable definitions in this section have been run. (The mutations of the defined hash tables do not accumulate between examples, though.)

(make-hash-table comparator)
(make-hash-table comparator k)

Returns a newly allocated hash table whose equality predicate and hash function are extracted from comparator.

It is an assertion violation if comparator is not a hash comparator.

As mentioned above, implementations are free to use an appropriate implementation-dependent hash function instead of the specified hash function, provided that the specified equality predicate is a refinement of the equal? predicate. This applies whether the hash function and equality predicate are passed as separate arguments or packaged up into a comparator.

The constraints on equality predicates and hash functions are given in SRFI 128.

(R6RS make-eq-hashtable, make-eqv-hashtable, and make-hashtable; Common Lisp make-hash-table)

Example:

(define tiny-table
  (make-hash-table (make-comparator number? = #f number-hash)))
;; tiny-table => #<an empty hash table whose future keys could be numbers>

(hash-table comparator key0 value0 ... ...)

Returns a newly allocated hash table, created as if by make-hash-table using comparator. For each pair of arguments, an association is added to the new hash table with key as its key and value as its value.

Example:

(define suits-table
  (hash-table (make-comparator symbol? symbol=? #f symbol-hash)
              'clubs #\x2663
              'diamonds #\x2666
              'hearts #\x2665
              'spades #\x2660))
;; suits-table => #<a hash table with four associations,
;;                  mapping the names of the suits of cards
;;                  as symbols to the Unicode codepoints of
;;                  their designs as characters>

(hash-table-unfold stop? mapper successor seed comparator)
(hash-table-unfold stop? mapper successor seed comparator k)

Create a new hash table as if by make-hash-table using comparator (and, if given, the value of k). If the result of applying the predicate stop? to seed is true, return the hash table. Otherwise, apply the procedure mapper to seed. Mapper returns two values, which are inserted into the hash table as the key and the value respectively. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm. The associations are inserted in left-to-right order beginning with the result of the first call to mapper.

If multiple calls to the mapper return the same key, it is unspecified whether it is an assertion violation or whether one of the values will be chosen. In this case, the position of the association for that key in the insertion order is also unspecified.

This procedure may not be continuation-safe.

Example:

(define alphabet-table
  (hash-table-unfold (lambda (c) (char>? c #\z))
                     (lambda (c) (values c (char-upcase c)))
                     (lambda (c) (integer->char (+ 1 (char->integer c))))
                     #\a
                     char-comparator
                     26))
;; alphabet-table => #<a hash table mapping the 26 lowercase
                      Basic Latin letters to their uppercase
                      counterparts>

Because the above example provides a value of k which is correct for the final size of the hash table, the unfold operation should be more efficient and the resulting hash table use less memory than if k had not been provided, or if it had been too large or too small. (Unless, of course, the implementation’s default value of k happens to be exactly 26.)

(alist->hash-table alist comparator)
(alist->hash-table alist comparator k)

Returns a newly allocated hash-table as if by make-hash-table using comparator and the optional k value. It is then initialized from the associations of alist. Key-value pairings are stored in the created hash table in reverse order to the one in which they appear in the input alist, and, in the case of duplicate keys, associations earlier in the list take precedence over those that come later.

Example:

(define telephone-numbers-table
  (alist->hash-table '((116123 . emotional-support)
                       (116117 . medical-advice)
                       (112 . emergency)
                       (112 . ambulance)
                       (112 . fire)
                       (110 . police))
                     (make-comparator exact-integer? = #f number-hash)))
;; telephone-numbers-table => #<a hash table mapping some
                                useful German telephone numbers
                                in ascending order; 112 is mapped
                                to the symbol emergency>

Predicates

(hash-table? obj)

Returns #t if obj is a hash table, and #f otherwise. (R6RS hashtable?; Common Lisp hash-table-p)

Examples:

(hash-table? tiny-table)  ;=> #t
(hash-table? suits-table) ;=> #t
(hash-table? '((this-is . an-alist) (not-a . hash-table))) ;=> #f

(hash-table-contains? hash-table key)

Returns #t if there is any association to key in hash-table, and #f otherwise. Must execute in expected amortized constant time.

(R6RS hashtable-contains?)

Examples:

(hash-table-contains? alphabet-table #\q) ;=> #t
(hash-table-contains? alphabet-table #\&) ;=> #f

(hash-table-empty? hash-table)

Returns #t if hash-table contains no associations, and #f otherwise.

Example:

(hash-table-empty? tiny-table)  ;=> #t
(hash-table-empty? suits-table) ;=> #f

(hash-table-mutable? hash-table)

Returns #t if the hash table is mutable. (R6RS hashtable-mutable?)

Examples:

(hash-table-mutable? tiny-table) ;=> #t
(hash-table-mutable? (hash-table-copy alphabet-table #f))
                                 ;=> #f

Accessors

The following procedures, given a key, return the corresponding value.

(hash-table-ref hash-table key)
(hash-table-ref hash-table key failure)
(hash-table-ref hash-table key failure success)

Extracts the value associated to key in hash-table, invokes the procedure success on it, and returns its result; if success is not provided, then the value itself is returned. If key is not contained in hash-table and failure is supplied, then failure is invoked on no arguments and its result is returned. Otherwise, it is an assertion violation. Must execute in expected amortized constant time, not counting the time to call the success and failure procedures.

Examples:

(hash-table-ref suits-table 'hearts) ;=> #\x2665
(hash-table-ref suits-table 'joker)  ; assertion violation
(hash-table-ref suits-table 'joker (lambda () 'ha-ha))
                                     ;=> ha-ha

(hash-table-ref suits-table
                'hearts
                (lambda () (assertion-violation #f "no love for hearts?!"))
                (lambda (char) (show #t (as-red char))))
                                     ; shows a red heart character

(hash-table-ref suits-table
               'joker
                (lambda () 'foo!)
                (lambda (char) (show #t (as-red char))))
                                     ;=> foo!

(hash-table-ref/default hash-table key default)

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-ref hash-table key (lambda () default))

(R6RS hashtable-ref; Common Lisp gethash)

Examples:

(hash-table-ref/default suits-table 'joker 'ha-ha)  ;=> ha-ha
(hash-table-ref/default suits-table 'hearts 'ha-ha) ;=> #\x2665

(hash-table-comparator hash-table)

Returns a hash comparator whose equality function and hash function are equivalent to the ones of the comparator provided when the hash table was constructed.

It is unspecified whether the returned comparator includes a type test function or ordering predicate. The returned comparator object may or may not be the same (in the sense of eqv?) as the comparator which was provided when the hash table was constructed.

Rationale: An implementation may wish to extract the hash and equivalence functions from a comparator and store them directly, rather than indirectly through the comparator. If it did so, it would have no reason to also store the ordering predicate and may not need to store the type test either. But it would be impossible to implement the R6RS (rnrs hashtables (6)) library in terms of this library without a means of inspection.

Example: The hash-table-empty-copy procedure could be implemented as follows.

(define (hash-table-empty-copy ht)
  (make-hash-table (hash-table-comparator ht)
                   (hash-table-size ht)))

Mutators

The following procedures alter the associations in a hash table either unconditionally or conditionally on the presence or absence of a specified key.

(hash-table-add! hash-table key1 value1 ... ...)

Repeatedly mutates hash-table, creating new associations in it by processing the arguments from left to right. For each of the pairs of keys and values, a new association is created at the end of the hash table ordering, associating the key with the value.

Must execute in expected amortized constant time per key.

It is an assertion violation if any of the keys already has an association in the hash-table. It is also an assertion violation if the same key (in the sense of the hash table’s hash function and equality predicate) is given multiple times.

Examples:

(define irc-alphabet-table (hash-table-copy alphabet-table #t))

;; IRC is a Finnish invention. In Finland, the punctuation characters
;; {, |, and } had uppercase variants [, \, and ] until the 1990s, when
;; this convention was abolished by ISO 8859 and 10646. However, the IRC
;; protocol continues to consider these characters to be case variants
;; of one another.
(hash-table-add! irc-alphabet-table #\{ #\[
                                    #\| #\\
                                    #\} #\])

(hash-table-add! alphabet-table #\w #\Ƿ) ; assertion violation

(hash-table-replace! hash-table key1 value1 ... ...)

Repeatedly mutates hash-table, replacing the value in each association for the given key with the corresponding value.

Must execute in expected amortized constant time per key.

It is an assertion violation if any of the keys does not have an association in the hash-table.

Examples:

;; Replace the red-coloured suit symbols with their hollow variants
(hash-table-replace! suits-table 'hearts #\x2661
                                 'diamonds #\x2662)

(hash-table-replace! alphabet-table #\þ #\Þ) ; assertion violation

(hash-table-set! hash-table key1 value1 ... ...)

Repeatedly mutates hash-table, creating new associations in it by processing the arguments from left to right. Newly created associations are added to the end of the hash table ordering. However, if there is a previous association for any key, its value is updated to the given value and the corresponding association remains in the same position in the ordering.

Must execute in expected amortized constant time per key.

Note: The hash-table-add! and hash-table-replace! procedures should often be used in preference to this procedure, in order to more faithfully represent the expectation either that a new association will be created for each key, or that each key will already have an extant association. In cases where an ‘upsert’ operation is intended, the hash-table-intern! and hash-table-update! procedures are also often more expressive than the equivalent operation expressed directly in terms of hash-table-set!.

Examples: The above examples of hash-table-add! with the irc-alphabet-table and hash-table-replace! with the suits-table work identically if those procedures are replaced by hash-table-set!. In addition, the assertion violation examples of those procedures with the alphabet-table are not violations if hash-table-set! is used instead:

(hash-table-set! alphabet-table #\w #\Ƿ)
(hash-table-set! alphabet-table #\þ #\Þ)

(hash-table-delete! hash-table key ...)

Deletes any association to each key in hash-table and returns the number of keys that had associations. If any key is given more than once, it is unspecified whether it is counted more than once in the return value.

Must execute in expected amortized constant time per key.

R6RS hashtable-delete! and Common Lisp remhash do not handle multiple associations.

Example:

(hash-table-delete! alphabet-table #\a #\e #\i #\o #\u #\{ #\| #\}) ;=> 5

(hash-table-intern! hash-table key failure)

Effectively invokes hash-table-ref with the given arguments and returns what it returns. If key was not found in hash-table, its value is set to the result of calling failure and that value is returned. Must execute in expected amortized constant time.

It is an assertion violation to use this procedure on an immutable hash table, even if the key has an association in the hash table.

Examples:

(hash-table-intern! alphabet-table #\z (lambda () #\Ȝ))
                      ;=> #\Z, and the hash table is unchanged
(hash-table-intern! alphabet-table #\ȝ (lambda () #\Ȝ))
                      ;=> #\Ȝ, and the hash table is updated
                      ;   with a new association from lowercase
                      ;   to uppercase yogh

(hash-table-update! hash-table key updater)
(hash-table-update! hash-table key updater failure)
(hash-table-update! hash-table key updater failure success)

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-set! hash-table key (updater (hash-table-ref hash-table key failure success)))

Must execute in expected amortized constant time.

Example:

(hash-table-update! tiny-table
                    12
                    (lambda (x) (+ x 1))
                    (lambda () (hash-table-size tiny-table))
                    values)
               ; tiny-table now associates the key 12 with 1

(hash-table-update!/default hash-table key updater default)

Semantically equivalent to, but may be more efficient than, the following code:

(hash-table-set! hash-table key (updater (hash-table-ref/default hash-table key default)))

(R6RS hashtable-update!)

Must execute in expected amortized constant time.

Example:

(hash-table-update!/default tiny-table
                            12
                            (lambda (x) (+ x 1))
                            0)
               ; tiny-table now associates the key 12 with 1

(hash-table-pop! hash-table)

Chooses the last, most recently added association from hash-table and removes it, returning the key and value as two values.

Must execute in expected amortized constant time.

It is an assertion violation if hash-table is empty.

Examples:

(hash-table-pop! alphabet-table) ;=> #\z #\Z, and alphabet-table no
                                 ;   longer contains this association

(hash-table-pop! tiny-table) ; assertion violation

(hash-table-clear! hash-table)

Delete all the associations from hash-table. The implementation may assume that the hash-table will later be re-filled with the same number of associations. (R6RS hashtable-clear!; Common Lisp clrhash)

Example:

(hash-table-clear! suits-table)
(hash-table-empty? suits-table) ;=> #t

The whole hash table

These procedures process the associations of the hash table in insertion order.

(hash-table-size hash-table)

Returns the number of associations in hash-table as an exact integer. Must execute in constant time. (R6RS hashtable-size; Common Lisp hash-table-count.)

Examples:

(hash-table-size tiny-table)     ;=> 0
(hash-table-size suits-table)    ;=> 4
(hash-table-size alphabet-table) ;=> 26

(hash-table= same? hash-table1 hash-table2)

Returns #t if hash-table1 and hash-table2 have the same keys (in the sense of their common equality predicate) and each key has the same value (in the sense of the same? procedure), and #f otherwise.

It is an assertion violation if the equality predicates of hash-table1 and hash-table2 are not the same in the sense of the eqv? procedure. On R6RS implementations where eqv? is not usefully defined on procedures by the implementation, this assertion violation is not required to be raised.

Examples:

(define alphabet-table*
  (hash-table-map (lambda (k v) (char-downcase v)) alphabet-table))

(hash-table= char-ci=? alphabet-table alphabet-table*) ;=> #t
(hash-table= char=? alphabet-table alphabet-table*)    ;=> #f

(hash-table-find proc hash-table failure)

For each association of hash-table, invoke proc on its key and value. If proc returns true, then hash-table-find returns what proc returns. If all the calls to proc return #f, return the result of invoking the thunk failure.

Examples:

(hash-table-find (lambda (number service)
                   (and (eq? service 'police)
                        number))
                 telephone-numbers-table
                 (lambda () 1312))
               ;=> 110

(hash-table-find (lambda (number service)
                   (and (eq? service 'private-detective)
                        number))
                 telephone-numbers-table
                 (lambda () 'no-such-service))
               ;=> no-such-service

(hash-table-count pred hash-table)

For each association of hash-table, invoke pred on its key and value. Return the number of calls to pred which returned true.

Example:

(hash-table-count (lambda (number service) (even? number))
                  telephone-numbers-table)
               ;=> 2

Low-level iteration

This section introduces the hash table cursor, a low-level mechanism for iterating over the associations in a hash table.

A hash table cursor is a Scheme object of an unspecified type, not guaranteed to be disjoint from any other Scheme type. It represents a particular key-value association within a hash table, and the ability to find a new cursor representing the association which comes immediately before or after that in the list associations. A hash table cursor can also be in an end state, in which case it does not represent any key-value association. Any given hash table cursor has limits on its spatial and temporal validity:

It is undefined behaviour to use a hash table cursor in any way which violates these limitations.

Examples: The procedures hash-table-fold-left and hash-table-fold-right could be implemented as follows.

(define (hash-table-fold-left proc seed ht)
  (let loop ((cur (hash-table-cursor-first ht))
             (acc seed))
    (if (hash-table-cursor-at-end? ht cur)
        acc
        (loop (hash-table-cursor-next ht cur)
              (proc acc
                    (hash-table-cursor-key ht cur)
                    (hash-table-cursor-value ht cur))))))

(define (hash-table-fold-right proc seed ht)
  (let loop ((cur (hash-table-cursor-last ht))
             (acc seed))
    (if (hash-table-cursor-at-end? ht cur)
        acc
        (loop (hash-table-cursor-previous ht cur)
              (proc (hash-table-cursor-key ht cur)
                    (hash-table-cursor-value ht cur)
                    acc)))))

The procedure hash-table-map! could be implemented as follows.

(define (hash-table-map! proc ht)
  (let loop ((cur (hash-table-cursor-first ht)))
    (if (hash-table-cursor-at-end? ht cur)
        ht
        (let ((new-val
               (call-with-values
                   (lambda () (hash-table-cursor-key+value ht cur))
                 proc)))
          (hash-table-cursor-value-set! ht cur new-val)
          (loop (hash-table-cursor-next ht cur))))))

(hash-table-cursor-first hash-table)
(hash-table-cursor-last hash-table)

Return a hash table cursor pointing, respectively, at the first or last association in the given hash-table.

Must execute in expected amortized constant time.

(hash-table-cursor-for-key hash-table key)

Returns a hash table cursor pointing at the association for the given key in the hash table. If there is no association for the key, returns a hash table cursor in the end state.

Must execute in expected amortized constant time.

(hash-table-cursor-next hash-table cursor)

Returns a hash table cursor pointing to the association in the given hash-table which comes immediately after the association referred to by the input cursor. Must execute in expected amortized constant time.

If the given cursor refers to the last association in the hash table, the returned hash table cursor is in the end state.

If the given cursor is already in the end state, it is undefined behaviour.

(hash-table-cursor-previous hash-table cursor)

Returns a hash table cursor pointing to the association in the given hash-table which comes immediately before the association referred to by the input cursor. Must execute in expected amortized constant time.

If the given cursor refers to the first association in the hash table, the returned hash table cursor is in the end state.

If the given cursor is already in the end state, it is undefined behaviour.

(hash-table-cursor-key hash-table cursor)
(hash-table-cursor-value hash-table cursor)

Return, respectively, the key or value of the association in the hash-table to which the given cursor refers. Must execute in expected amortized constant time.

If the given cursor is in the end state, it is undefined behaviour.

(hash-table-cursor-key+value hash-table cursor)

Returns two values: the key and value of the association in the hash-table to which the given cursor refers. Must execute in expected amortized constant time.

If the given cursor is in the end state, it is undefined behaviour.

(hash-table-cursor-value-set! hash-table cursor value)

Replaces the value of the association in the hash-table to which the given cursor refers with value. Must execute in expected amortized constant time.

If the given cursor is in the end state or if hash-table is immutable, it is undefined behaviour.

(hash-table-cursor-at-end? hash-table cursor)

Returns #t if the given cursor is in the end state in the given hash-table, and #f otherwise. Must execute in expected amortized constant time.

Mapping and folding

These procedures process the associations of the hash table in insertion order, unless otherwise noted.

(hash-table-map proc hash-table)

Returns a newly allocated hash table as if by (hash-table-empty-copy hash-table). Calls proc for every association in hash-table with the key and value of the association. The key of the association and the result of invoking proc are entered into the new hash table. Note that this is not the result of lifting mapping over the domain of hash tables, but it is considered more useful.

Example:

(hash-table-map (lambda (k v) (string v k)) alphabet-table)
               ;=> #<a hash table mapping each lowercase
                     letter of the basic Latin alphabet to
                     a string containing that letter in
                     uppercase and lowercase forms>

(hash-table-map! proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is used to update the value of the association. Returns hash-table.

Example:

(hash-table-map! (lambda (k v)
                   (char-upcase
                    (string-ref (symbol->string k) 0)))
                 suits-table)
               ;=> #<the suits-table, which now maps
                    the names of suits as symbols to the
                    uppercase first letters of their names
                    as characters>

(hash-table-for-each proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The value returned by proc is discarded. Returns an unspecified value.

Example:

(hash-table-for-each (lambda (k v) (display k)) alphabet-table)
               ; displays ‘abcdefghijklmnopqrstuvwxyz’

See also the example of hash-table-empty-copy.

(hash-table-map->list proc hash-table)

Calls proc for every association in hash-table with two arguments: the key of the association and the value of the association. The values returned by the invocations of proc are accumulated into a list in the insertion order of the hash-table, which is returned.

Example:

(hash-table-map->list (lambda (k v) (string v k))
                      alphabet-table)
               ;=> ("Aa" "Bb" "Cc" "Dd" ...)

(hash-table-fold proc seed hash-table)

Calls proc for every association in hash-table with three arguments: the key of the association, the value of the association, and an accumulated value val. Val is seed for the first invocation of procedure, and for subsequent invocations of proc, the returned value of the previous invocation. The value returned by hash-table-fold is the return value of the last invocation of proc.

The proc is invoked for the associations in an unspecified order. To fold over the associations in order, see the next entries.

Rationale: An implementation may be able to provide more efficient iteration in an unspecified order than in insertion order when the order is not significant for the proc.

Example: See the example for hash-table-fold-right, except that the resulting list when using hash-table-fold may be in any order and not alphabetical.

(hash-table-fold-left proc seed hash-table)

Calls proc for every association in hash-table, in order from oldest to newest, with three arguments: an accumulated value val, the key of the association, and the value of the association. Val is seed for the first invocation of procedure, and for subsequent invocations of proc, the returned value of the previous invocation. The value returned by hash-table-fold is the return value of the last invocation of proc.

Example:

(hash-table-fold-left (lambda (ls k v)
                        (cons v ls))
                      '()
                      suits-table)
          ;=> (#\x2660 #\x2665 #\x2666 #\x2663)

(hash-table-fold-right proc seed hash-table)

Calls proc for every association in hash-table, in order from oldest to newest, with three arguments: the key of the association, the value of the association, and an accumulated value val. Val is seed for the first invocation of procedure, and for subsequent invocations of proc, the returned value of the previous invocation. The value returned by hash-table-fold is the return value of the last invocation of proc.

Example:

(hash-table-fold-right (lambda (k v ls)
                         (cons k ls))
                       '()
                       alphabet-table)
          ;=> (#\a #\b #\c #\d ...)

(hash-table-prune! proc hash-table)

Calls proc for every association in hash-table with two arguments, the key and the value of the association, and removes all associations from hash-table for which proc returns true. Returns the number of associations that were removed.

Example:

(hash-table-prune! (lambda (k v) (even? k))
                   telephone-numbers-table)
               ;=> 2, and telephone-numbers-table now contains only
                   associations for 116117 and 116123

Copying and conversion

(hash-table-copy hash-table)
(hash-table-copy hash-table mutable?)

Returns a newly allocated hash table with the same properties and associations as hash-table. If the second argument is present and is true, the new hash table is mutable. Otherwise it is immutable provided that the implementation supports immutable hash tables. (R6RS hashtable-copy)

Example: See the examples for hash-table-mutable? and hash-table-add!

(hash-table-empty-copy hash-table)

Returns a newly allocated mutable hash table with the same comparator as hash-table, but with no associations. The implementation may assume that the returned hash table will eventually contain as many associations as does the original hash-table.

Example: The procedure hash-table-map could be implemented as follows.

(define (hash-table-map proc ht)
  (let ((new-ht (hash-table-empty-copy ht)))
    (hash-table-for-each (lambda (k v)
                           (hash-table-add! new-ht k (proc k v)))
                         ht)
    new-ht))

(hash-table->alist hash-table)

Returns an alist with the same associations as hash-table in reverse insertion order.

Example:

(hash-table->alist telephone-numbers-table)
               ;=> ((116123 . emotional-support)
                    (116117 . medical-advice)
                    (112 . emergency)
                    (110 . police))

Hash tables as sets

The following examples assume the irc-alphabet-table from the example of hash-table-add! has been defined.

(hash-table-union! hash-table1 hash-table2)

Adds the associations of hash-table2 to hash-table1 and returns hash-table1. If a key appears in both hash tables, its value is set to the value appearing in hash-table1. Returns hash-table1.

Example:

(hash-table-union! alphabet-table irc-alphabet-table)
                ;=> alphabet-table, now also containing the
                ;   additional associations from irc-alphabet-table

(hash-table-intersection! hash-table1 hash-table2)

Deletes the associations from hash-table1 whose keys don’t also appear in hash-table2 and returns hash-table1.

Example:

(hash-table-intersection! alphabet-table irc-alphabet-table)
                ;=> alphabet-table, now empty

(hash-table-difference! hash-table1 hash-table2)

Deletes the associations of hash-table1 whose keys are also present in hash-table2 and returns hash-table1.

Example:

(hash-table-difference! alphabet-table irc-alphabet-table)
                ; alphabet-table, now only containing associations
                ; for #\{, #\|, and #\}

(hash-table-xor! hash-table1 hash-table2)

Deletes the associations of hash-table1 whose keys are also present in hash-table2, and then adds the associations of hash-table2 whose keys are not present in hash-table1 to hash-table1. Returns hash-table1.

Example:

(hash-table-xor! alphabet-table
                 (hash-table char-comparator
                             #\a #\A
                             #\e #\E
                             #\i #\I
                             #\o #\O
                             #\u #\U
                             #\{ #\[
                             #\| #\\
                             #\} #\]))
                ;=> alphabet-table, now only containing entries for
                ;   consonants and Fennoscandian vocalic punctuation

Implementation

The sample implementation is in the repository of this SRFI.

It runs on R7RS and R6RS Schemes and requires a small additional prelude. Suitable ‘generic’ preludes for any R7RS Small and R6RS implementations are provided. You can tune or rewrite the prelude easily for what your particular Scheme implementation provides. As examples, the R6RS prelude is tuned for Chez Scheme, and there is an additional alternative prelude for Guile based on the R6RS one.

Running on R7RS Small, it requires the (scheme case-lambda) library and SRFIs 1 (list library), 128 (comparators), 151 (bitwise operations), and 160 (homogeneous numeric vectors). SRFI 160 could also easily be replaced by SRFI 4. All assertion violations are raised as R7RS Small ‘error objects’.

Running on R6RS, it requires SRFIs 128 (comparators) and 133 (vector library). A future version may eliminate the minor dependency on SRFI 133.

In the form in which the sample implementation is distributed in the repository, the hash tables of the sample implementation are disjoint from any other existing hash table implementation. However, it would be trivial to provide the R6RS hash table library or any of the other SRFI hash table libraries in terms of this SRFI and therefore make those libraries interoperable within a particular Scheme implementation.

Implementation techniques

The sample implementation uses a technique apparently due to Raymond Hettinger and tested in PyPy before being adopted by CPython.

With this strategy, the actual ‘hash table’ in the old-fashioned sense is an open-addressed array which contains index values into another array of the actual entries in order. Because the index values are typically small, they can be stored as 8-bit or 16-bit numbers in the majority of cases, upgrading their size as more entries are added, going to 32 and 64 bits for the very largest of hash tables. This makes the vast majority of real-world hash tables considerably smaller in memory footprint than in traditional implementations, and insertion ordering of entries is a nice side-effect.

In this implementation technique, there is no performance benefit to using the unordered hash-table-fold over the ordered variants hash-table-fold-left and hash-table-fold-right.

A disadvantage of this approach is that deletion of hash table entries (other than the most recently added) can only be achieved by replacing an entry in the insertion-ordered array with a special sentinel value which later needs to be cleaned up. Applications which often add and delete keys may notice a speed penalty. Benchmarks show moderate gains in performance compared to classical hash tables for other access patterns, however, due to better use of the CPU cache.

Another option is to store hash table entries on a doubly-linked list. This was the original technique used when ordered hash tables were made part of the Ruby language in version 1.9, but the implementation was changed in Ruby 3.0 to use the technique described above. With this technique, unordered iteration can make more efficient use of the CPU cache than the ordered variants. Deletion is no less efficient than on a typical hash table implementation. Such an implementation could also provide an extension library allowing adding entries at the beginning of the insertion order as well as at the end, or even in the middle.

The need to store and manipulate the extra pair of pointers on each entry to maintain insertion order makes this technique significantly worse in memory use and slightly worse in speed than a hash table without insertion ordering. A clever implementation in C might be able to work around this.

It is not recommended to attempt to implement this SRFI by bundling an existing non-ordering hash table implementation and a list together into a record type.

Acknowledgements

Some of the language of this SRFI is copied from SRFI 69 with thanks to its author, Panu Kalliokoski. However, he is not responsible for what I have done with it.

I also acknowledge the members of the SRFI 250, 125, 126, and 128 mailing lists, especially Takashi Kato, Alex Shinn, Shiro Kawai, and Per Bothner.

Footnotes

* Historical note: The draft versions of R6RS up to and including R5.92RS also used the hyphenated hash-table form. It was, puzzlingly, changed in response to formal comment 215, which argued for hyphenating the name bytevector, not against hyphenating hash-table.

© 2023 John Cowan, Will Clinger, Daphne Preston-Kendal.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler