Title

Extensible hash table constructor

Author

Marc Feeley

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-90@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This SRFI specifies the procedure make-table, a hash table constructor compatible with SRFI 69 (Basic hash tables). The procedure make-table allows various parameters of the hash table to be specified with optional named parameters when it is constructed. These parameters are: the initial size, the minimum and maximum load factor, the key equivalence function, the key hashing function, whether the references to the keys are weak, and similarly for the values. By using optional named parameters, as specified in SRFI 89 (Optional positional and named parameters), the constructor's API can be easily extended in a backward compatible way by other SRFIs and Scheme implementations.

Rationale

SRFI 69 specifies ``basic hash tables''. It provides the procedure make-hash-table to construct hash tables, and a set of procedures to operate on hash tables.

SRFI 69's make-hash-table accepts two optional positional parameters: a procedure to test whether two keys are equal and a procedure to hash keys. There are several other hash table implementations (in Scheme and other languages) which allow other aspects of the hash table to be specified when the hash table is constructed, mainly to improve performance. For instance:

  1. The initial size of the hash table. This parameter is useful to reduce the cost of resizing the hash table when it is known that the hash table will contain many keys. Even though SRFI 69 does not provide a size parameter, the implementation in SRFI 69 handles the third optional positional parameter as a size parameter.
  2. The minimum and maximum load of the hash table. These parameters are useful to control the amount of key collisions, which affects performance of hash table lookup, insertion, and resizing. The load is a measure of space efficiency. It is the space used by the (active) key/value associations divided by the total space used by the hash table. A load close to 0 offers fast lookup and insertion operations, but is not space efficient. A load close to 1 is very space efficient, but operations are slow. The implementation can dynamically resize the hash table to keep the load roughly between the minimum and maximum load (the load may not be exactly within the bounds due to implementation constraints, such as keeping at least one free entry in the hash table and requiring the size of the table to be a prime number).
  3. The ``strength'' of references to the keys and values. Some runtime systems support two types of object references: weak references and strong references. When the garbage collector determines that the only references to an object are weak, it can reclaim that object. In the case of hash tables, if the keys are considered to be weak references then the garbage collector can reclaim the key and that entry of the hash table if the key is only referred to with weak references. This feature is particularly useful with eq? hash tables to attach auxiliary information to objects without affecting the liveness of these objects (for example serial numbers and access mutexes). Similarly, if the values are considered to be weak references then the garbage collector can reclaim the value and that entry of the hash table if the value is only referred to with weak references.

These parameters have reasonable implementation dependent default values:

  1. Initial size: 0 or a small number
  2. Minimum load: 0.5
  3. Maximum load: 0.8
  4. Key reference weakness: false
  5. Value reference weakness: false

Given that there are several parameters and that they all have reasonable default values, it is appropriate to use optional named parameters to pass them to the constructor. This makes it possible for other SRFIs and Scheme implementations to extended the API in a backward compatible way.

Specification

The make-table procedure accepts optional named parameters as though it was defined using the SRFI 89 (Optional positional and named parameters) define* form like this:

    (define* (make-table
              (test:        test        equal?)
              (hash:        hash        H)
              (size:        size        S)
              (min-load:    min-load    MINL)
              (max-load:    max-load    MAXL)
              (weak-keys:   weak-keys   #f)
              (weak-values: weak-values #f))
      ...)

The default values S, MINL, and MAXL are implementation dependent. The default value H is also implementation dependent but it must be consistent with the test parameter (i.e. (test X Y) implies (= (hash X) (hash Y))). If supplied explicitly the test parameter must be a two-parameter equivalence procedure, the hash parameter must be a one-parameter hash function consistent with test, the size parameter must be a nonnegative exact integer, and the min-load and max-load parameters must be real numbers in the range 0 to 1 and min-load must be less than max-load.

The parameters size, min-load, max-load, weak-keys and weak-values provide information that can be useful to improve the performance of the runtime system. However, the implementation can choose to ignore any of these parameters if they have no meaning for the runtime system. In other words these are purely advisory parameters.

    (define t1 (make-table))

    (define t2 (make-table size: 1000))

    (define t3 (make-table hash: string-length))

    (define t4 (make-table test: eq? weak-keys: #t))

    (define obj2sn (make-table test: eq? weak-keys: #t))
    (define sn2obj (make-table test: equal? weak-values: #t))
    (define last-sn 0)

    (define (object->serial-number obj)
      (or (hash-table-ref/default obj2sn obj #f)
          (let ((sn (+ last-sn 1)))
            (set! last-sn sn)
            (hash-table-set! obj2sn obj sn)
            (hash-table-set! sn2obj sn obj)
            sn)))

    (define (serial-number->object sn)
      (hash-table-ref/default sn2obj sn #f))

    (define obj1 (list 1 11))
    (define obj2 (list 2 22))
    (define obj3 (list 3 33))

    (object->serial-number obj1)  ==>  1
    (object->serial-number obj2)  ==>  2
    (object->serial-number obj1)  ==>  1

    (set! obj2 'void)

    (object->serial-number obj3)  ==>  3
    (serial-number->object 1)     ==>  (1 11)
    (serial-number->object 2)     ==>  (2 22) or #f

Implementation

The following implementation requires SRFI 69 (Basic hash tables) and SRFI 89 (Optional positional and named parameters). It defines the make-table procedure to accept its parameters using optional named parameters and passes them to make-hash-table:

(define make-table
  (let ((absent (list 'absent)))
    (lambda* ((test:        test        absent)
              (hash:        hash        absent)
              (size:        size        0)
              (min-load:    min-load    0)
              (max-load:    max-load    1)
              (weak-keys:   weak-keys   #f)
              (weak-values: weak-values #f))
      (cond ((eq? test absent)
             (if (eq? hash absent)
                 (make-hash-table)
                 (make-hash-table equal? hash)))
            ((eq? hash absent)
             (make-hash-table test))
            (else
             (make-hash-table test hash))))))

Note that the implementation conforms to this SRFI even though it ignores the parameters size, min-load, max-load, weak-keys and weak-values. A more complete treatment of these parameters would require implementation dependent changes to the runtime system.

Copyright

Copyright (C) Marc Feeley (2006). All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Donovan Kolbly