Title

Sets, bags, integer sets, enumeration sets

Author

John Cowan

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. To provide input on this SRFI, please mail to <srfi minus 113 at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Abstract

Sets and bags (also known as multisets) are unordered mutable collections that can contain any Scheme object. Sets enforce the constraint that no two elements can be the same in the sense of the set's associated equivalence procedure; bags do not. Integer sets are a variant of sets that can only contain non-negative exact integers that are less than a maximum value specified when the integer set is created, and that always treat numerical equality as their equivalence procedure. Enumeration sets are another variant that can only contain symbols chosen from a set of symbols represented by an enumeration type; the equivalence procedure for them is eq?.

Issues

  1. R6RS provides define-enumeration to help set up enum-types. Is this worth having? Possible syntax is:

    (define-enumeration <type-name> (<symbol> ...) <constructor>)

    This would bind <type-name> to the enum-type, and constructor to a curried version of make-enum-set that already knows what type to use.

  2. Should there be a mechanism to convert between integer sets and integers as bitvectors, as defined in SRFI 33, SRFI 60, and R6RS?

  3. Currently you can convert one set type to another via lists. Are conversions directly through sets (or bags) useful enough to justify enlarging the SRFI? What about direct conversions between other types?

  4. How about set-intern!, which is like set-value but adds the element to the set if it's not already there?

Rationale

Sets are a standard part of the libraries of many high-level programming languages, including Smalltalk, Java, and C++. Although sets subsume integer sets and enumeration sets, providing them separately allows for increased implementation efficiency.

Insofar as possible, the names in this SRFI are harmonized with the names used for ordered collections (lists, strings, vectors, and bytevectors) in Scheme.

The design of enumeration sets is founded on R6RS enumerations, but with the addition of reified enumeration types along the lines suggested by R6RS Formal Comment #262. The prefix enum is used in all cases instead of using both enum and enumeration as R6RS does. Approximate mappings are shown below.

Specification

The procedures in this SRFI are in the (scheme sets) and (srfi xxx) libraries (or (srfi :xxx) on R6RS). However, the sample implementation currently places them in the (sets) library instead.

Sets, bags, integer sets, enumeration sets, and enumeration types are mutually disjoint, and disjoint from other types of Scheme objects.

Set procedures

It is an error for a single procedure to be invoked on sets with different equivalence procedures.

Constructors

(make-set equivalence)

Returns a newly allocated empty set. equivalence is the equivalence procedure for the set. If it is coarser than equal (in the sense of R7RS-small 6.1), the implementation may signal an error; however, string-ci=? must be supported.

(set equivalence element ...)

Returns a newly allocated set with equivalence procedure equivalence and containing the elements.

(set-copy set)

Returns a newly allocated set containing the elements of set, with the same equivalence procedure.

(set-empty-copy set)

Returns a newly allocated set containing no elements with the same equivalence procedure as set.

Predicates

(set? obj)

Returns #t if obj is a set, and #f otherwise.

(set-member? set element)

Returns #t if element is a member of set and #f otherwise.

Mutators

(set-add! set element)

Adds element to set unless it is already a member. Returns an unspecified value.

(set-delete! set element)

Deletes element from set if it is a member. Returns #t if the element was a member, #f if not.

Mapping and folding

(set-map equivalence proc set)

Applies proc to each element of set in arbitrary order and returns a newly allocated set with the equivalence predicate equivalence which contains the results of the applications. For example:

(set-map string-ci=? symbol->string (set eq? 'foo 'bar 'baz)) => (set string-ci=? "foo" "bar" "baz")

(set-for-each proc set)

Applies proc to set in arbitrary order, discarding the returned values. Returns unspecified results.

(set-fold proc nil set)

Invokes proc on each member of set in arbitrary order, passing the result of the previous invocation as a second argument. For the first invocation, nil is used as the second argument. Returns the result of the last invocation.

(set-unfold equivalence continue? mapper successor seed)

Create a new set as if by make-set. If the result of applying the predicate continue? to seed is #f, return the set. Otherwise, apply the procedure mapper to seed. The value that mapper returns is added to the set. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm.

Conversion

(set->list set)

Returns a newly allocated list containing the members of set in unspecified order. However, repeated calls to this procedure will return a list in the same order until the set is mutated.

(list->set equivalence list)

Returns a newly allocated set with equivalence predicate equivalence containing the elements of list.

Subsets

(set=? set ...)

Returns #t if each set contains the same elements.

(set<? set ...)

Returns #t if each set other than the last is a proper subset of the following set, and #f otherwise.

(set>? set ...)

Returns #t if each set other than the last is a proper superset of the following set, and #f otherwise.

(set<=? set ...)

Returns #t if each set other than the last is a subset of the following set, and #f otherwise.

(set>=? set ...)

Returns #t if each set other than the last is a superset of the following set, and #f otherwise.

The whole set

(set-size set)

Returns the number of elements in set.

(set-filter predicate set)

Returns a newly allocated set with the same equivalence predicate as set containing just the elements of set that satisfy predicate.

(set-remove predicate set)

Returns a newly allocated set with the same equivalence predicate as set containing just the elements of set that do not satisfy predicate.

(set-partition predicate set)

Returns two values, a newly allocated set with the same equivalence predicate as set containing just the elements of set that satisfy predicate, and another newly allocated set with the same equivalence predicate as set containing just the elements of set that do not satisfy predicate.

(set-count predicate set)

Returns the number of elements of set that satisfy predicate as an exact integer.

(set-find predicate set)

Returns an arbitrarily chosen element of set that satisfies predicate, or #f if there is none. Note the ambiguity: if you are searching for #f, you cannot tell if you have found it or not.

(set-any predicate set)

Returns #t if any element of set satisfies predicate, or #f otherwise.

(set-every predicate set)

Returns #t if every element of set satisfies predicate, or #f otherwise.

Set operations

(set-union set1 set2 ...)

(set-intersection set1 set2 ...)

(set-difference set1 set2 ...)

(set-xor set1 set2)

Returns a newly allocated set that is the union, intersection, asymmetric difference, or symmetric difference of the sets. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.

(set-union! set1 set2 ...)

(set-intersection! set1 set2 ...)

(set-difference! set1 set2 ...)

(set-xor! set1 set2)

The same as set-union, set-intersection, set-difference, and set-xor respectively, but may mutate the set1 argument.

Set value

(set-value set element)

Returns the element of set that is equal, in the sense of the equivalence predicate, to element. If element is not a member of the set, it is returned.

Bag procedures

Bags are like sets, but can contain the same object more than once. However, if two elements that are equal in the sense of the equivalence procedure, but not in the sense of eqv?, are both included, it is not guaranteed that they will remain distinct when retrieved from the bag. It is an error for a single procedure to be invoked on bags with different equivalence procedures.

The procedures for creating and manipulating bags are the same as those for sets, except that set is replaced by bag in their names, and that adding an element to a bag is effective even if the bag already contains the element. There are no procedures bag-xor, bag-xor!, or bag-value.

(bag-element-count bag element)

Returns an exact integer representing the number of times that element appears in bag.

(bag-increment! bag element count)

(bag-decrement! bag element count)

Increases or decreases the element count of element in bag by the exact integer count.

Integer set procedures

The elements of an integer set are non-negative exact integers less than the set's limit, which is specified when it is created. Except as noted below, the procedures for creating and manipulating integer sets are the same as those for sets, except that set is replaced by integer-set in their names, and references to equivalence predicates are replaced by limits, as the equivalence function is always =. Wherever a newly allocated integer set is returned, it has the same limit as the source sets. It is an error for a single procedure to be invoked on integer sets with different limits.

The procedure integer-set-value is just the identity function, so it is not provided.

(make-integer-set limit)

Returns a newly allocated integer set. The possible elements of the set are the exact integers from 0 to limit - 1, where limit is an exact non-negative integer. The set is empty.

(make-universal-integer-set limit)

Returns a newly allocated integer set. The possible elements of the set are the exact integers from 0 to limit - 1, where limit is an exact non-negative integer. The set contains all possible elements.

(integer-set-complement integer-set)

Returns a newly allocated integer set that is the complement of integer-set.

(integer-set-complement! integer-set)

Mutates integer-set to its complement.

(integer-set-min integer-set)

(integer-set-max integer-set)

Returns the smallest or largest integer in integer-set, or #f if there is none.

(integer-set-delete-min! integer-set)

(integer-set-delete-max! integer-set)

Returns and deletes the smallest or largest integer in integer-set, or #f if there is none.

Enumeration sets

Except as noted below, the procedures for creating and manipulating enumeration sets are the same as those for sets, except that set is replaced by enum-set in their names. Wherever a newly allocated enumeration set is returned, it has the same enumeration type as the source sets. It is an error for a single procedure to be invoked on enumeration sets with different enumeration types.

The procedure enum-set-value is just the identity function, so it is not provided.

Enum-type procedures

(make-enum-type enum-list)

Returns a newly allocated enumeration type suitable for constructing enumeration sets whose members are symbols. The elements of enum-list are either symbols or else pairs where the car is a symbol and the cdr is a distinct non-negative exact integer used to represent the symbol internally. If the element is a symbol, the implementation assigns its own choice of a distinct non-negative exact integer. These integers are called indexes. The symbols are said to be in the enumeration type. In R6RS the function of this procedure is provided as part of make-enumeration.

(enum-type-enums enum-type)

Return a newly allocated alist mapping the symbols in enum-type to their indexes.

(enum-type-index enum-type symbol)

Return the index of symbol in enum-type, or #f if it is not in enum-type. The R6RS equivalent is the procedure returned by enum-type-indexer when applied to an enum-set belonging to enum-type.

(enum-type-symbol enum-type index)

Returns the symbol whose index in enum-type is index, or #f if there is none.

(enum=? enum-type symbol ...)

(enum<? enum-type symbol ...)

(enum>? enum-type symbol ...)

(enum<=? enum-type symbol ...)

(enum>=? enum-type symbol ...)

Returns #t if the indexes of the symbols are equal (which implies that the symbols themselves are eq?), monotonically increasing, monotonically decreasing, monotonically nondecreasing, and monotonically nonincreasing respectively, and #f otherwise.

Enum-set procedures

(make-enum-set enum-type)

Returns a newly allocated enumeration set. The possible elements of the set are the symbols in enum-type. The set is empty. The approximate R6RS equivalents are enum-set-constructor and make-enumeration.

(make-universal-enum-set enum-type)

Returns a newly allocated enumeration set. The possible elements of the set are the symbols in enum-type. The set contains all possible elements. The approximate R6RS equivalent is enum-set-universe.

(enum-set enum-type element ...)

Returns a newly allocated enumeration set. The possible elements of the set are the symbols in enum-type. The set is initialized to contain the elements. There is no R6RS equivalent.

(list->enum-set enum-type list)

Returns a newly allocated enumeration set. The possible elements of the set are the symbols in enum-type. The set is initialized to contain the elements of list. There is no R6RS equivalent.

(enum-set-complement enum-set)

Returns a newly allocated enumeration set that is the complement of enum-set. This procedure is also in R6RS.

(enum-set-complement! enum-set)

Mutates enum-set to its complement. This procedure is also in R6RS.

(enum-set-min integer-set)

(enum-set-max integer-set)

Returns the smallest or largest symbol in symbol index order in enum-set, or #f if there is none.

(enum-set-delete-min! enum-set)

(enum-set-delete-max! enum-set)

Returns and deletes the smallest or largest symbol in symbol index order in enum-set, or #f if there is none.

(enum-set-projection enum-set enum-type)

Returns a newly allocated enumeration set of type enum-type. Its elements are the symbols belonging to enum-set, ignoring any symbols which are not in enum-type. This procedure is also in R6RS, but uses a second enum-set in place of enum-type.

Implementation

Note: the present implementation does not yet support set-find, set-any, set-every, or the corresponding bag, integer set, and enum-set procedures.

Sets and bags are implemented as a thin veneer over hashtables, and integer sets over bytevectors. For clarity, the integer set implementation stores just one value into each bytevector element rather than the eight that would be possible. In turn, enumeration types are implemented over a hashtable from symbols to indexes, and enumeration sets over integer sets.

A sample implementation is here. The files sets-impl.scm, bags-impl.scm, isets-impl.scm, and enums-impl.scm are the basic code libraries for sets, bags, integer sets, and enumeration sets respectively. Each is divided into a section that makes use of the underlying collection and a section that doesn't. The file count.scm provides macros for conveniently looping over the bytevectors that underlie integer sets and enumeration sets. The file srfi-4-shim.scm is a minimal implementation of R7RS bytevectors on top of SRFI 4's u8vectors, just sufficient for this implementation, and may be useful for R5RS systems that do not provide R7RS bytevectors.

Two library definition files are provided: sets.sld provides an R7RS library named (sets) that exports the public procedures of this SRFI (the extension .sld is used by Chibi for library definition files). sets.scm is equivalent, but uses Chicken module syntax.

The file sets-test.scm is a test suite for this library. It uses Chicken's test egg, which is also provided on Chibi as an R7RS library named (chibi test).

Copyright

Copyright (C) John Cowan 2013. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Mike Sperber