mailto:srfi-69@srfi.schemers.org
.
See instructions
here to subscribe to the list. You can access previous messages via
the
archive of the mailing list.
This SRFI specifies an API for basic hash tables. Hash tables are data
structures that provide a mapping from some set of keys to some set of
values associated to those keys. Hash tables have no intrinsic order
for these associations, and allow key lookup and destructive updating in
amortised constant time.
Issues
There is no single best way to make hash tables. The tables presented
in this SRFI aim at being both conceptually simple and usable for a wide
variety of applications. Even though a portable implementation is
provided, Scheme implementations can speed things up considerably by
e.g. providing an internal hash function for symbols. Moreover, almost
every Scheme implementation already has some kind of low-level hash
table functionality, because that's the natural way to implement
the global namespace, and specifically, to provide support for
string->symbol. There might be some benefit in integration
between implementation-specific namespace data types and the hash table
API presented here; however, these issues are left open.
Rationale
Hash tables are widely recognised as a fundamental data structure for many kinds of computational tasks. Almost every non-minimal Scheme implementation provides some kind of hash table functionality.
Alas, although somewhat similar, these hash table APIs have many differences: some trivial, like the naming of certain functions; some complex, like revealing different aspects of the internal implementation to the user; some coarse, like requiring keys to be of some specific type(s); some subtle, like requiring the user to guess the size of the hash table in advance to get optimal performance. As a result, the easiest way to write portable Scheme programs that use hash tables is to implement some kind of hash tables in the program itself, based on vectors.
The primary aim of this SRFI is to provide a simple and generic hash
table API that will answer most of users' needs for basic usage of hash
tables; the reference implementation just shows one way of meeting the
requirements of the API.
Specification
Names defined in this SRFI:
Procedure: make-hash-table [ comparison ] [ hash ] [ sizehint ] → hash-table
Create a new hash table with no associations. comparison is a predicate that should accept two keys and return a boolean telling whether they denote the same key value; it defaults to equal?.
hash is a hash function, and defaults to an appropriate hash function for the given comparison predicate (see section Hashing). However, an acceptable default is not guaranteed to be given for any comparison function coarser than equal?, except for string-ci=?. A comparison function c1 is coarser than a comparison function c2 if there exist values x and y such that (and (c1 x y) (not (c2 x y))). The function hash must be acceptable for comparison, so if you use a coarser comparison than equal? (other than string-ci=?), you must always provide the function hash yourself. The hash function is always called with an appropriate bound parameter.
sizehint, when given, can be used to directly make the hash table of good size for sizehint associations, but may be ignored.
Procedure: make-symbol-hash-table [ sizehint ] → hash-table
Procedure: make-string-hash-table [ sizehint ] → hash-table
Procedure: make-string-ci-hash-table [ sizehint ] → hash-table
Procedure: make-integer-hash-table [ sizehint ] → hash-table
These procedures are provided as shorthands for creating hash tables whose keys are symbols, strings, case-insensitive strings, and integers, respectively.
Procedure: hash-table? obj → boolean
A predicate to test whether a given object obj is a hash table. The hash table type should be disjoint from all other types, if possible.
Procedure: list->hash-table alist [ comparison ] [ hash ] → hash-table
Takes an association list
alist and creates a hash table
hash-table which maps the car of every element in alist to the
cdr of corresponding elements in alist. comparison and hash
are interpreted as in make-hash-table. If some key occurs multiple
times in alist, it is unspecified which of the corresponding values
will end up in hash-table. (Note: the choice of using cdr
(instead of cadr) for values tries to strike balance between the two
approaches: using cadr would rend this procedure unusable for
cdr alists, but not vice versa.)
Procedure: hash-table-ref hash-table key [ default ] → value
Procedure: hash-table-get hash-table key [ default ] → value
These procedures return the value associated to key in hash-table. If no value is associated is associated to key, default is returned if given; if not, an error is signalled. This operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table. (Note: this rules out implementation by association lists or fixed-length hash tables.) (Note: the name hash-table-get is provided for compatibility.)
Procedure: hash-table-set! hash-table key value → undefined
Procedure: hash-table-put! hash-table key value → undefined
These procedures set the value associated to key in hash-table. The previous association (if any) is removed. This operation should have an amortised complexity of O(1) with respect to the number of associations in hash-table. (Note: this rules out implementation by association lists or fixed-length hash tables.) (Note: the name hash-table-put! is provided for compatibility.)
Procedure: hash-table-delete! hash-table key → undefined
Procedure: hash-table-remove! hash-table key → undefined
These procedures remove any association to key in hash-table. It is not an error if no association for that key exists; in this case, nothing is done. This operation should have an amortised complexity of O(1) with respect to the number of associations in hash-table. (Note: this rules out implementation by association lists or fixed-length hash tables.) (Note: the name hash-table-remove! is provided for compatibility.)
Procedure: hash-table-exists? hash-table key → boolean
This predicate tells whether there is any association to key in hash-table. This operation should have an amortised complexity of O(1) with respect to the number of associations in hash-table. (Note: this rules out implementation by association lists or fixed-length hash tables.)
Procedure: hash-table-count hash-table → integer
Returns the number of associations in hash-table. This operation must have a complexity of O(1) with respect to the number of associations in hash-table.
Procedure: hash-table-keys hash-table → list
Returns a list of keys in hash-table. The order of the keys is unspecified.
Procedure: hash-table-values hash-table → list
Returns a list of values in hash-table. The order of the values is unspecified, and is not guaranteed to match the order of keys in the result of hash-table-keys.
Procedure: hash-table-for-each hash-table proc → unspecified
proc should be a function taking two arguments, a key and a value. This procedure calls proc for each association in hash-table, giving the key of the association as key and the value of the association as value. The results of proc are discarded. The order in which proc is called for the different associations is unspecified.
(Note: in some implementations, there is a procedure called hash-table-map which does the same as this procedure. However, in other implementations, hash-table-map does something else. In no implementation that I know of, hash-table-map does a real functorial map that lifts an ordinary function to the domain of hash tables. Because of these reasons, hash-table-map is left outside this SRFI.)
Procedure: hash-table-fold hast-table f init-value → final-value
This procedure calls f for every association in hash-table with
three arguments: the key of the association key, the value of the
association value, and an accumulated value
, val. val is
init-value for the first invocation of f, and for subsequent
invocations of f, the return value of the previous invocation of f.
The value final-value returned by hash-table-fold is the return
value of the last invocation of f. The order in which f is called
for different associations is unspecified.
Procedure: hash-table->list hash-table → alist
Returns an association list such that the car of each element in alist is a key in hash-table and the corresponding cdr of each element in alist is the value associated to the key in hash-table. The order of the elements is unspecified.
hash-table->list and list->hash-table are meant to be inverses, modulo the fact that association lists may have multiple values associated to a key, whereas hash tables have internal knowledge about their comparison function. (Therefore, no rigorous definition.)
Hashing means the act of taking some value and producing a number from the value. A hash function is a function that does this. Every comparison predicate comparison has a set of acceptable hash functions for that predicate; a hash funtion hash is acceptable iff (comparison obj1 obj2) → (= (hash obj1) (hash obj2)).
A hash function h is good for a comparison predicate comparison if it distributes the result numbers (hash values) for non-equal objects (by comparison) as uniformly as possible over the numeric range of hash values, especially in the case when some (non-equal) objects resemble each other by e.g. having common subsequences. This definition is vague but should be enough to assert that e.g. a constant function is not a good hash function.
When the definition of make-hash-table above talks about an
appropriate
hashing function for comparison, it means a hashing
function that gives decent performance (for the hashing operation) while
being both acceptable and good for comparison. This definition, too,
is intentionally vague.
Procedure: hash object [ bound ] → integer
Produces a hash value for object in the range ( 0, bound (. If bound is not given, the implementation is free to choose any bound, given that the default bound is greater than the size of any imaginable hash table in a normal application. (This is so that the implementation may choose some very big value in fixnum range for the default bound.) This hash function is acceptable for equal?.
Procedure: string-hash string [ bound ] → integer
The same as hash, except that the argument string must be a string.
Procedure: string-ci-hash string [ bound ] → integer
The same as string-hash, except that the case of characters in string does not affect the hash value produced.
Procedure: symbol-hash symbol [ bound ] → integer
The same as hash, except that the argument symbol must be a symbol.
Implementations are encouraged to provide this function as a builtin.
Implementation
This implementation relies on SRFI-9 for distinctness of the hash table type, and on SRFI-23 for error reporting. Otherwise, the implementation is pure R5RS.
(define (%string-hash s ch-conv bound) (let ((hash 31) (len (string-length s))) (do ((index 0 (+ index 1))) ((>= index len) hash) (set! hash (modulo (+ (* 37 hash) (char->integer (ch-conv (string-ref s index)))) bound))))) (define (string-hash s . maybe-bound) (let ((bound (if (null? maybe-bound) *default-bound* (car maybe-bound)))) (%string-hash s (lambda (x) x) bound))) (define (string-ci-hash s . maybe-bound) (let ((bound (if (null? maybe-bound) *default-bound* (car maybe-bound)))) (%string-hash s char-downcase bound))) (define (symbol-hash s . maybe-bound) (let ((bound (if (null? maybe-bound) *default-bound* (car maybe-bound)))) (%string-hash (symbol->string s) (lambda (x) x) bound))) (define (hash obj . maybe-bound) (let ((bound (if (null? maybe-bound) *default-bound* (car maybe-bound)))) (cond ((integer? obj) (modulo obj bound)) ((string? obj) (string-hash obj bound)) ((symbol? obj) (symbol-hash obj bound)) ((real? obj) (modulo (+ (numerator obj) (denominator obj)) bound)) ((number? obj) (modulo (+ (hash (real-part obj)) (* 3 (hash (imag-part obj)))) bound)) ((char? obj) (modulo (char->integer obj) bound)) ((vector? obj) (vector-hash obj bound)) ((pair? obj) (modulo (+ (hash (car obj)) (* 3 (hash (cdr obj)))) bound)) ((null? obj) 0) ((not obj) 0) ((procedure? obj) (error "hash: procedures cannot be hashed" obj)) (else 1)))) (define (vector-hash v bound) (let ((hashvalue 571) (len (vector-length v))) (do ((index 0 (+ index 1))) ((>= index len) hashvalue) (set! hashvalue (modulo (+ (* 257 hashvalue) (hash (vector-ref v index))) bound))))) (define %make-hash-node cons) (define %hash-node-set-value! set-cdr!) (define %hash-node-key car) (define %hash-node-value cdr) (define-record-type :hash-table (%make-hash-table count hash compare associate entries) hash-table? (count hash-table-count hash-table-set-count!) (hash hash-table-hash-function) (compare hash-table-comparison-function) (associate hash-table-association-function) (entries hash-table-entries hash-table-set-entries!)) (define *default-table-size* 64) (define (make-hash-table . args) (let* ((comparison (if (null? args) equal? (car args))) (hash (or (and (not (null? args)) (not (null? (cdr args))) (cadr args)) (and (eq? comparison string=?) string-hash) (and (eq? comparison string-ci=?) string-ci-hash) hash)) (size (if (or (null? args) (null? (cdr args)) (null? (cddr args))) *default-table-size* (caddr args))) (association (or (and (eq? comparison eq?) assq) (and (eq? comparison eqv?) assv) (and (eq? comparison equal?) assoc) (letrec ((associate (lambda (val alist) (cond ((null? alist) #f) ((comparison val (caar alist)) (car alist)) (else (associate val (cdr alist))))))) associate)))) (%make-hash-table 0 hash comparison association (make-vector size '())))) (define (make-hash-table-maker comp hash) (lambda args (apply make-hash-table (cons comp (cons hash args))))) (define make-symbol-hash-table (make-hash-table-maker eq? symbol-hash)) (define make-string-hash-table (make-hash-table-maker string=? string-hash)) (define make-string-ci-hash-table (make-hash-table-maker string-ci=? string-ci-hash)) (define make-integer-hash-table (make-hash-table-maker = modulo)) (define (%hash-table-hash hash-table key) ((hash-table-hash-function hash-table) key (vector-length (hash-table-entries hash-table)))) (define (%hash-table-find entries associate hash key) (associate key (vector-ref entries hash))) (define (%hash-table-add! entries hash key value) (vector-set! entries hash (cons (%make-hash-node key value) (vector-ref entries hash)))) (define (%hash-table-delete! entries compare? hash key) (let ((entrylist (vector-ref entries hash))) (cond ((null? entrylist) #f) ((compare? key (caar entrylist)) (vector-set! entries hash (cdr entrylist)) #t) (else (let loop ((current (cdr entrylist)) (previous entrylist)) (cond ((null? current) #f) ((compare? key (caar current)) (set-cdr! previous (cdr current)) #t) (else (loop (cdr current) current)))))))) (define (%hash-table-for-each proc entries) (do ((index (- (vector-length entries) 1) (- index 1))) ((< index 0)) (for-each proc (vector-ref entries index)))) (define (%hash-table-maybe-resize! hash-table) (let* ((old-entries (hash-table-entries hash-table)) (hash-length (vector-length old-entries))) (if (> (hash-table-count hash-table) hash-length) (let* ((new-length (* 2 hash-length)) (new-entries (make-vector new-length '())) (hash (hash-table-hash-function hash-table))) (%hash-table-for-each (lambda (node) (%hash-table-add! new-entries (hash (%hash-node-key node) new-length) (%hash-node-key node) (%hash-node-value node))) old-entries) (hash-table-set-entries! hash-table new-entries))))) (define (hash-table-ref hash-table key . maybe-default) (cond ((%hash-table-find (hash-table-entries hash-table) (hash-table-association-function hash-table) (%hash-table-hash hash-table key) key) => %hash-node-value) ((null? maybe-default) (error "hash-table-ref: no value associated with" key)) (else (car maybe-default)))) (define hash-table-get hash-table-ref) (define (hash-table-set! hash-table key value) (let ((hash (%hash-table-hash hash-table key)) (entries (hash-table-entries hash-table))) (cond ((%hash-table-find entries (hash-table-association-function hash-table) hash key) => (lambda (node) (%hash-node-set-value! node value))) (else (%hash-table-add! entries hash key value) (hash-table-set-count! hash-table (+ 1 (hash-table-count hash-table))) (%hash-table-maybe-resize! hash-table))))) (define hash-table-put! hash-table-set!) (define (hash-table-delete! hash-table key) (if (%hash-table-delete! (hash-table-entries hash-table) (hash-table-comparison-function hash-table) (%hash-table-hash hash-table key) key) (hash-table-set-count! hash-table (- (hash-table-count hash-table) 1)))) (define hash-table-remove! hash-table-delete!) (define (hash-table-exists? hash-table key) (and (%hash-table-find (hash-table-entries hash-table) (hash-table-association-function hash-table) (%hash-table-hash hash-table key) key) #t)) (define (hash-table-for-each hash-table proc) (%hash-table-for-each (lambda (node) (proc (%hash-node-key node) (%hash-node-value node))) (hash-table-entries hash-table))) (define (hash-table-fold hash-table f acc) (hash-table-for-each hash-table (lambda (key value) (set! acc (f key value acc)))) acc) (define (list->hash-table alist . args) (let ((hash-table (make-hash-table (and (not (null? args)) (car args)) (and (not (null? args)) (not (null? (cdr args))) (cadr args)) (max *default-table-size* (* 2 (length alist)))))) (for-each (lambda (elem) (hash-table-set! hash-table (car elem) (cdr elem))) alist) hash-table)) (define (hash-table->list hash-table) (hash-table-fold hash-table (lambda (key val acc) (cons (cons key val) acc)) '())) (define (hash-table-keys hash-table) (hash-table-fold hash-table (lambda (key val acc) (cons key acc)) '())) (define (hash-table-values hash-table) (hash-table-fold hash-table (lambda (key val acc) (cons val acc)) '()))
Copyright © Panu Kalliokoski (2005). All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the
Software
), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED AS IS
, WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.