This SRFI is currently in ``draft'' status.
To see an explanation of
each status that a SRFI can hold, see here.
To provide input on this SRFI, please
mail to
<srfi minus 113 at srfi dot schemers dot org>
. See
instructions here to
subscribe to the list. You can access previous messages via
the archive of the mailing list.
Sets and bags (also known as multisets) are unordered collections that can contain any Scheme object. Sets enforce the constraint that no two elements can be the same in the sense of the set's associated equality predicate; bags do not. Integer sets are a variant of sets that can only contain non-negative exact integers that are less than a maximum value specified when the integer set is created, and that always use numerical equality as their equality predicate.
16. What provisions should we have for controlling which element of a set is preserved when a new element equal to an existing element is added?
Sets are a standard part of the libraries of many high-level programming languages, including Smalltalk, Java, and C++. Racket provides general sets similar to those of this proposal, though with fewer procedures, and there is a Chicken egg called sets
(unfortunately undocumented), which provides a minimal set of procedures.
Bags are useful for counting anything from a fixed set of possibilities, e.g. the number of each type of error in a log file or the number of uses of each word in a lexicon drawn from a body of documents. Although other data structures can serve the same purpose, using bags clearly expresses the programmer's intent and can be optimized.
Although sets subsume integer sets, providing them separately allows for increased implementation efficiency. There are two Chicken eggs that provide integer sets: iset
, which is conceptually similar to the integer sets described here, but using bignums rather than bytevectors; and cis
, which uses lists of integer intervals.
Insofar as possible, the names in this SRFI are harmonized with the names used for ordered collections (lists, strings, vectors, and bytevectors) in Scheme. However, size
is used instead of length
to express the number of elements in the collection, because length
implies order.
Although it's possible to use the general sets of this SRFI to contain characters, the use of SRFI 14 is recommended instead. The names in this SRFI are harmonized with SRFI 14, except that SRFI 14 does not contain analogues of the set>>?
, set<=?
, set>=?
, set-remove
, or set-partition
procedures.
Sets do not have a lexical syntax representation. It's possible to use SRFI 108 quasi-literal constructors to create them in code, but this SRFI does not standardize how that is done.
The interface to general sets and bags depends on SRFI 114 comparators, despite that SRFI having a higher number than this one (for hysterical raisins). Comparators conveniently package the equality predicate of the set with the hash function or comparison procedure needed to implement the set efficiently.
During the discussion of this SRFI, a new expression type enumeration-case
similar to case
was proposed. It would dispatch based on a member of an enumeration set, with enumeration-specific error checking? Specifically, an error would be signaled if the cases are not exhaustive, or if any clause datum is not a member of the enumeration set. Unfortunately, due to phasing problems, enumeration-case
would only work with enumerations defined using define-enum
, and it would be awkward to implement it with syntax-rules
only. Therefore, it was not included.
Sets, bags, and integer sets are mutually disjoint, and disjoint from other types of Scheme objects.
It is an error for any procedure defined in this SRFI to be invoked on sets or bags with different comparators.
The procedures of this SRFI, by default, are "pure functional" — they do not alter their parameters. However, this SRFI defines a set of "linear-update" procedures, all of whose names end in !
. They have hybrid pure-functional/side-effecting semantics: they are allowed, but not required, to side-effect one of their parameters in order to construct their result. An implementation may legally implement these procedures as pure, side-effect-free functions, or it may implement them using side effects, depending upon the details of what is the most efficient or simple to implement in terms of the underlying representation.
It is an error to rely upon these procedures working by side effect. For example, this is not guaranteed to work:
(let* ((set1 (set 'a 'b 'c)) ; set1 = {a,b,c}. (set2 (set-adjoin! 'd))) ; Add d to {a,b,c}. set1) ; Could be either {a,b,c} or {a,b,c,d}.
However, this is well-defined:
(let ((set1 (set 'a 'b 'c))) (set-adjoin! set1 'd)) ; Add d to {a,b,c}.
So clients of these procedures write in a functional style, but must additionally be sure that, when the procedure is called, there are no other live pointers to the potentially-modified character set (hence the term "linear update").
There are two benefits to this convention:
Implementations are free to provide the most efficient possible implementation, either functional or side-effecting.
Programmers may nonetheless continue to assume that sets are purely functional data structures: they may be reliably shared without needing to be copied, uniquified, and so forth.
In practice, these procedures are most useful for efficiently constructing sets in a side-effecting manner, in some limited local context, before passing the character set outside the local construction scope to be used in a functional manner.
Scheme provides no assistance in checking the linearity of the potentially side-effected parameters passed to these functions — there's no linear type checker or run-time mechanism for detecting violations.
Note that if the implementation is entirely pure functional, it is allowed to return existing sets rather than newly allocated ones, even where this SRFI explicitly says otherwise.
Constructors: set
, set-adjoin
, set-delete
, set-unfold
Predicates: set?
, set-contains?
, set-empty?
, set-disjoint?
Accessor: set-value
Linear update procedures: set-adjoin!
, set-delete!
, set-intern!
The whole set: set-size
, set-find
, set-count
, hash-table-any?
, hash-table-every?
Mapping and folding: set-map
, set-for-each
, set-fold
, hash-table-filter
, hash-table-filter!
, hash-table-remove
, hash-table-partition
Copying and conversion: set-copy
, set-empty-copy
, set->list
, list->set
, list->set!
Subsets: hash-table=?
, hash-table<?
, hash-table>?
, hash-table<=?
, hash-table>=?
Set operations: hash-table-union!
, hash-table-intersection!
, hash-table-difference!
Additional bag procedures: bag-element-count
, bag-for-each-unique
, bag-fold-unique
, bag-increment!
, bag-decrement!
, bag->set
, set->bag
, set->bag!
Additional integer set procedures: integer-set
, make-universal-integer-set
, integer-set-complement
, integer-set-complement!
, integer-set-min
, integer-set-max
(set
comparator element ... )
Returns a newly allocated empty set. The comparator argument is a SRFI 114 comparator. The elements are used to initialize the set.
(set-adjoin
set
element ... )
Returns a newly allocated set that uses the same comparator as set and contains all the values of set, and in addition each element unless it is already the same (in the sense of set's equality predicate) as an existing member. It is an error to add an element to set that does not return #t
when passed to the type test procedure of the comparator with which set was defined.
(set-delete
set
element ... )
Returns a newly allocated set containing all the values of set except for the elements. Any element that is not a member of the set is ignored.
(set-unfold
comparator continue? mapper successor seed)
Create a newly allocated set as if by set
using comparator. If the result of applying the predicate continue? to seed is #f
, return the set. Otherwise, apply the procedure mapper to seed. The value that mapper returns is added to the set. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm.
(set?
obj)
Returns #t
if obj is a set, and #f
otherwise.
(set-member?
set
element)
Returns #t
if element is a member of set and #f
otherwise.
(set-empty?
set)
Returns #t
if set has no elements and #f
otherwise.
(set-disjoint?
set1 set2)
Returns #t
if set1 and set2 have no elements in common and #f
otherwise.
(set-value
set
element)
Returns the element of set that is equal, in the sense of set's equality predicate, to element. If element is not a member of set, it is returned.
(set-adjoin!
set
element)
Returns a set that uses the same comparator as set and contains all the values of set, and in addition each element unless it is already a member. It is an error to add an element to set that does not return #t
when passed to the type test procedure of the comparator with which set was defined.
(set-delete!
set
element)
Returns a set containing all the values of set except for the elements. Any element that is not equal (in the sense ofset's equality predicate) to some member of the set is ignored.
(set-intern!
set
element)
Returns two values. If there is an element of set that is equal, in the sense of set's equality predicate, to element, then that element is returned as the first value and set is returned as the second value. If there is no such element, then the first value is element and the second value is the result of (set-add!
set element)
.
(set-search!
set element failure success)
The set is searched for element. If it is not found, then the failure procedure is tail-called with two continuation arguments, insert and ignore, and is expected to tail-call one of them. If element is found, then the success procedure is tail-called with the matching element of set and two continuations, update and remove, and is expected to tail-call one of them.
The effects of the continuations are as follows (where obj is any Scheme object):
Invoking (
insert obj)
causes element to be inserted into set.
Invoking (
ignore obj)
causes set to remain unchanged.
Invoking (
update new-element obj)
causes new-element to be inserted into set.
Invoking (
remove obj)
causes the matching element of set to be removed from it.
In all cases, two values are returned: the possibly updated set and obj.
(set-size
set)
Returns the number of elements in set as an exact integer.
(set-count
predicate
set)
Returns the number of elements of set that satisfy predicate as an exact integer.
(set-find
predicate set failure)
Returns an arbitrarily chosen element of set that satisfies predicate, or the result of invoking failure with no arguments if there is none.
(set-any?
predicate
set)
Returns #t
if any element of set satisfies predicate, or #f
otherwise. Note that this differs from the SRFI 1 analogue because it does not return a useful value.
(set-every?
predicate
set)
Returns #t
if every element of set satisfies predicate, or #f
otherwise. Note that this differs from the SRFI 1 analogue because it does not return a useful value.
(set-map
comparator
proc
set)
Applies proc to each element of set in arbitrary order and returns a newly allocated set, created as if by set
, which contains the results of the applications. For example:
(set-map string-ci=? symbol->string (set eq? 'foo 'bar 'baz)) => (set string-ci=? "foo" "bar" "baz")
(set-for-each
proc
set)
Applies proc to set in arbitrary order, discarding the returned values. Returns an unspecified result.
(set-fold
proc
nil
set)
Invokes proc on each member of set in arbitrary order, passing the result of the previous invocation as a second argument. For the first invocation, nil is used as the second argument. Returns the result of the last invocation.
(set-filter
predicate
set)
Returns a newly allocated set, created as if by set-empty-copy
, containing just the elements of set that satisfy predicate.
(set-filter!
predicate
set)
A linear update procedure that returns a set containing just the elements of set that satisfy predicate.
(set-remove
predicate
set)
Returns a newly allocated set, created as if by set-empty-copy
, containing just the elements of set that do not satisfy predicate.
(set-partition
predicate
set)
Returns two values: a newly allocated set, created as if by set-empty-copy
, containing just the elements of set that satisfy predicate, and another newly allocated set, created as if by set-empty-copy
, containing just the elements of set that do not satisfy predicate.
(set-copy
set)
Returns a newly allocated set containing the elements of set, and using the same comparator.
(set-empty-copy
set)
Returns a newly allocated set containing no elements, and using the same comparator as set.
(set->list
set)
Returns a newly allocated list containing the members of set in unspecified order. However, repeated calls to this procedure will return a list in the same order until the set is mutated.
(list->set
comparator list)
Returns a newly allocated set, created as if by set
using comparator, that contains the elements of list.
(list->set!
comparator list)
Returns a set using comparator that contains the elements of list.
(set=?
set ...)
Returns #t
if each set contains the same elements.
(set<?
set ...)
Returns #t
if each set other than the last is a proper subset of the following set, and #f
otherwise.
(set>?
set ...)
Returns #t
if each set other than the last is a proper superset of the following set, and #f
otherwise.
(set<=?
set ...)
Returns #t
if each set other than the last is a subset of the following set, and #f
otherwise.
(set>=?
set ...)
Returns #t
if each set other than the last is a superset of the following set, and #f
otherwise.
(set-union
set1
set2 ...)
(set-intersection
set1
set2 ...)
(set-difference
set1
set2 ...)
(set-xor
set1
set2)
Returns a newly allocated set that is the union, intersection, asymmetric difference, or symmetric difference of the sets. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.
(set-union!
set1
set2 ...)
(set-intersection!
set1
set2 ...)
(set-difference!
set1
set2 ...)
(set-xor!
set1
set2)
Linear update procedures returning a set that is the union, intersection, asymmetric difference, or symmetric difference of the sets. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.
Bags are like sets, but can contain the same object more than once. However, if two elements that are the same in the sense of the equality predicate, but not in the sense of eqv?
, are both included, it is not guaranteed that they will remain distinct when retrieved from the bag. It is an error for a single procedure to be invoked on bags with different comparators.
The procedures for creating and manipulating bags are the same as those for sets, except that set
is replaced by bag
in their names, and that adding an element to a bag is effective even if the bag already contains the element. If two elements in a bag are the same in the sense of the equality predicate, the implementation may in fact store just one of them. There are no procedures named bag-xor
or bag-xor!
.
(bag-element-count
bag
element)
Returns an exact integer representing the number of times that element appears in bag.
(bag-for-each-unique
proc
bag)
Applies proc to each unique element of bag in arbitrary order, passing the element and the number of times it occurs in bag, and discarding the returned values. Returns an unspecified result.
(bag-fold-unique
proc
nil
bag)
Invokes proc on each unique element of bag in arbitrary order, passing the number of occurrences as a second argument and the result of the previous invocation as a third argument. For the first invocation, nil is used as the third argument. Returns the result of the last invocation.
(bag-increment!
bag
element
count)
(bag-decrement!
bag
element
count)
Linear update procedures that return a bag with the same elements as bag, but with the element count of element in bag increased or decreased by the exact integer count (but not less than zero).
(bag->set
bag)
(set->bag
set)
The bag->set
procedure returns a newly allocated set containing the unique elements of bag. The set->bag
procedure returns a newly allocated bag containing the elements of set. The comparator of the result is the same as the comparator of the argument.
The elements of an integer set are non-negative exact integers less than the set's limit, which is specified when it is created. Except as noted below, the procedures for creating and manipulating integer sets are the same as those for sets, except that set
is replaced by integer-set
in their names.
Wherever a newly allocated integer set is returned, it has the same limit as the source sets. It is an error for a single procedure to be invoked on integer sets with different limits.
There are no integer-set-value
or integer-set-intern!
procedures.
(integer-set
limit element ...)
Returns a newly allocated integer set. The possible elements of the set are the exact integers from 0 to limit - 1, where limit is an exact non-negative integer. The set contains the elements.
(universal-integer-set
limit)
Returns a newly allocated integer set. The possible elements of the set are the exact integers from 0 to limit - 1, where limit is an exact non-negative integer. The set contains all possible elements.
(range->-integer-set
limit low high)
Returns a newly allocated integer set. The possible elements of the set are the exact integers from 0 to limit - 1, where limit is an exact non-negative integer. The set contains the elements from low (inclusive) to high (exclusive).
(integer-set-adjoin-range
integer-set low high)
Returns a newly allocated integer set with the same limit as integer-set and that contains all its values, and in addition each integer from low (inclusive) to high (exclusive) unless it is already equal to an existing member.
(integer-set-adjoin-range!
integer-set low high)
A linear update procedure that returns an integer set with the same limit as integer-set and that contains all its values, and in addition each integer from low (inclusive) to high (exclusive) unless it is already equal to an existing member.
(integer-set-complement
integer-set)
Returns a newly allocated integer set that is the complement of integer-set.
(integer-set-complement!
integer-set)
A linear update procedure that returns a set that is the complement of integer-set.
(integer-set-min
integer-set)
(integer-set-max
integer-set)
Returns the smallest or largest integer in integer-set, or #f
if there is none.
(integer-set-delete-min!
integer-set)
(integer-set-delete-max!
integer-set)
Linear update procedures that return two values: the smallest or largest integer in integer-set or #f
if there is none, and a set which contains all the elements of set except the integer being returned.
The implementation places the identifiers defined above into the sets
library.
Sets and bags are implemented as a thin veneer over hashtables, and integer sets over bytevectors. For clarity, the integer set implementation stores just one value into each bytevector element rather than the eight that would be possible.
The sample implementation contains the following files:
sets-impl.scm
– implementation of general setsbags-impl.scm
– implementation of bagsisets-impl.scm
– implementation of integer setscount.scm
– provides macros for conveniently looping over bytevectorssrfi-4-shim.scm
– minimal implementation of R7RS bytevectors on top of SRFI 4's u8vectorssets.sld
– an R7RS library named (sets)
sets.scm
– a Chicken librarysets-test.scm
– test suiteThe test suite will work with the Chicken test
egg, which is provided on Chibi as the (chibi test)
library.
Copyright (C) John Cowan 2013. All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.