This SRFI is currently in ``draft'' status.
To see an explanation of
each status that a SRFI can hold, see here.
To provide input on this SRFI, please
mail to
<srfi minus 113 at srfi dot schemers dot org>
. See
instructions here to
subscribe to the list. You can access previous messages via
the archive of the mailing list.
Sets and bags (also known as multisets) are unordered mutable collections that can contain any Scheme object. Sets enforce the constraint that no two elements can be the same in the sense of the set's associated equality predicate; bags do not. Integer sets are a variant of sets that can only contain non-negative exact integers that are less than a maximum value specified when the integer set is created, and that always treat numerical equality as their equality predicate. Likewise, character sets are sets that can only contain characters; their equality predicate is char=?
. Finally, enumeration sets are another variant that can only contain symbols chosen from a set of symbols represented by an enumeration type; the equality predicate for them is eq?.
(resolved)
(resolved)
(resolved)
(resolved)
(resolved)
(resolved)
(resolved)
Should we switch to unique enum objects rather than symbols?
(resolved)
(resolved)
(resolved)
Should we stick to just comparators, or also allow equality predicates?
Should we switch to six libraries (sets, bags, integer sets, character sets, enum sets, and set-bag conversion), or stick with a single library? (This is not about dividing the SRFI itself.)
Should we add a cursor API similar to SRFI 14's?
Should we add comparators for the various types?
Sets are a standard part of the libraries of many high-level programming languages, including Smalltalk, Java, and C++. Racket provides general sets similar to those of this proposal, though with fewer procedures, and there is a Chicken egg called sets
(unfortunately undocumented), which provides a minimal set of procedures.
Bags are useful for counting anything from a fixed set of possibilities, e.g. the number of each type of error in a log file or the number of uses of each word in a lexicon drawn from a body of documents. Although other data structures can serve the same purpose, using bags clearly expresses the programmer's intent and can be optimized.
Although general sets subsume all other types of sets, providing a few types separately allows for increased implementation efficiency. In particular, sets whose elements are restricted to non-negative exact integers within a small range are provided. There are two Chicken eggs that provide integer sets: iset
, which is conceptually similar to the integer sets described here, but using bignums rather than bytevectors; and cis
, which uses lists of integer intervals.
Character sets are already specified by SRFI 14, The interface is not as complete as this SRFI, so an overlay of procedures is provided here.
The design of enumeration sets is founded on R6RS enumerations, but with the addition of reified enumeration types along the lines suggested by R6RS Formal Comment #262. The prefix enum
is used in all cases instead of using both enum
and enumeration
as R6RS does. Approximate mappings are shown below.
Insofar as possible, the names in this SRFI are harmonized with the names used for ordered collections (lists, strings, vectors, and bytevectors) in Scheme. However, size
is used instead of length
to express the number of elements in the collection, because length
implies order.
Sets do not have a lexical syntax representation. It's possible to use SRFI 108 quasi-literal constructors to create them in code, but this SRFI does not standardize how that is done.
The procedures in this SRFI are in the (srfi 113)
library (or (srfi :113)
on R6RS). However, the sample implementation currently places them in the (sets)
library instead.
Sets, bags, integer sets, enumeration sets, and enumeration types are mutually disjoint, and disjoint from other types of Scheme objects. Character sets belong to the type defined by SRFI 14, but are disjoint from all other types.
It is an error for any procedure defined in this SRFI to be invoked on sets or bags with different comparators.
Procedures whose names end with !
are linear update procedures in the sense of SRFI 1. This means that an implementation may legally implement these procedures as pure, side-effect-free functions, or it may implement them using side effects, depending upon the details of what is the most efficient or simple to implement in terms of the underlying representation. Furthermore, if the implementations of all linear update procedures on a given type of set are pure, then the procedures which are said to return new sets are not required to actually allocate them: they may return already existing objects. See SRFI 1 for the rationale.
(set
comparator element ...)
Returns a new set. The comparator argument is either a SRFI 114 comparator or else a procedure, in which case it is the equality predicate for the set. Any comparator with an equality predicate and a corresponding hash function may be used. Implementations must accept the equality predicates eq?
, eqv?
, equal
, string=?
, and string-ci=?
, and may accept others. The elements are used to initialize the set.
(set-add
set element ...)
Returns a new set containing the elements of set, and in addition the specified elements. The new set uses the same comparator or equality predicate as set. It is an error to pass an element to this procedure that does not return #t
when passed to the type test procedure of the comparator with which set was defined.
(set-delete
set element ...)
Returns a new set containing the the elements of set with the possible exception of the elements, if they are present in set. The new set uses the same comparator or equality predicate.
(set-copy
set)
Returns a new set containing the elements of set, and using the same comparator or equality predicate.
(set-empty-copy
set)
Returns a new set containing no elements, and using the same comparator or equality predicate as set.
(set?
obj)
Returns #t
if obj is a set, and #f
otherwise.
(set-contains?
set
element)
Returns #t
if element is a member of set and #f
otherwise.
(set-value
set
element)
Returns the element of set that is equal, in the sense of set's equality predicate, to element. If element is not a member of set, it is returned.
(set-add!
set
element ...)
Returns a set containing the elements of set, and in addition the specified elements. The set uses the same comparator or equality predicate as set. It is an error to pass an element to this procedure that does not return #t
when passed to the type test procedure of the comparator with which set was defined.
(set-delete!
set
element ...)
Returns a set containing the the elements of set with the possible exception of the elements, if they are present in set. The new set uses the same comparator or equality predicate..
(set-intern!
set
element)
Returns two values. If there is an element of set that is equal, in the sense of set's equality predicate, to element, then that element is returned as the first value and set is returned as the second value. If there is no such element, then the first value is element and the second value is the result of (set-add!
set element)
.
(set-map
comparator
proc
set)
Applies proc to each element of set in arbitrary order and returns a new set, created as if by set
, which contains the results of the applications. For example:
(set-map string-ci=? symbol->string (set eq? 'foo 'bar 'baz)) => (set string-ci=? "foo" "bar" "baz")
(set-for-each
proc
set)
Applies proc to set in arbitrary order, discarding the returned values. Returns an unspecified result.
(set-fold
proc
nil
set)
Invokes proc on each member of set in arbitrary order, passing the result of the previous invocation as a second argument. For the first invocation, nil is used as the second argument. Returns the result of the last invocation.
(set-unfold
comparator continue? mapper successor seed)
Create a new set as if by set
. If the result of applying the predicate continue? to seed is #f
, return the set. Otherwise, apply the procedure mapper to seed. The value that mapper returns is added to the set. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm.
(set->list
set)
Returns a newly allocated list containing the members of set in unspecified order. However, repeated calls to this procedure will return a list in the same order until the set is mutated.
(list->set
comparator list)
Returns a new set, created as if by set
, that contains the elements of list.
(list->set!
set list)
Returns set with the elements of list added to it.
Note that these functions do not obey the trichotomy law, and thus do not specify a total order on sets.
(set=?
set ...)
Returns #t
if each set contains the same elements.
(set<?
set ...)
Returns #t
if each set other than the last is a proper subset of the following set, and #f
otherwise.
(set>?
set ...)
Returns #t
if each set other than the last is a proper superset of the following set, and #f
otherwise.
(set<=?
set ...)
Returns #t
if each set other than the last is a subset of the following set, and #f
otherwise.
(set>=?
set ...)
Returns #t
if each set other than the last is a superset of the following set, and #f
otherwise.
(set-size
set)
Returns the number of elements in set as an exact integer.
(set-filter
predicate
set)
Returns a new set, created as if by set-empty-copy
, containing just the elements of set that satisfy predicate.
(set-filter!
predicate
set)
Returns a set containing just the elements of set that satisfy predicate.
(set-remove
predicate
set)
Returns a new set, created as if by set-empty-copy
, containing just the elements of set that do not satisfy predicate.
(set-partition
predicate
set)
Returns two values: a new set, created as if by set-empty-copy
, containing just the elements of set that satisfy predicate, and another new set, created as if by set-empty-copy
, containing just the elements of set that do not satisfy predicate.
(set-count
predicate
set)
Returns the number of elements of set that satisfy predicate as an exact integer.
(set-find
predicate set failure)
Returns an arbitrarily chosen element of set that satisfies predicate, or the result of invoking failure with no arguments if there is none.
(set-any?
predicate
set)
Returns #t
if any element of set satisfies predicate, or #f
otherwise. Note that this differs from the SRFI 1 analogue because it does not return a useful value.
(set-every?
predicate
set)
Returns #t
if every element of set satisfies predicate, or #f
otherwise. Note that this differs from the SRFI 1 analogue because it does not return a useful value.
(set-union
set1
set2 ...)
(set-intersection
set1
set2 ...)
(set-difference
set1
set2 ...)
(set-xor
set1
set2)
Returns a new set that is the union, intersection, asymmetric difference, or symmetric difference of the sets. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.
(set-union!
set1
set2 ...)
(set-intersection!
set1
set2 ...)
(set-difference!
set1
set2 ...)
(set-xor!
set1
set2)
The same as set-union
, set-intersection
, set-difference
, and set-xor
respectively, except that they may mutate the set1 argument.
Bags are like sets, but can contain the same object more than once. The procedures for creating and manipulating bags are the same as those for sets, except that set
is replaced by bag
in their names. If two elements in a bag are equal in the sense of the equality predicate, the implementation may in fact store just one of them.
There are no procedures bag-xor
or bag-xor!
.
(bag-element-count
bag
element)
Returns an exact integer representing the number of times that element appears in bag.
(bag-for-each-unique
proc
bag)
Applies proc to each unique element of bag in arbitrary order, passing the element and the number of times it occurs in bag, and discarding the returned values. Returns an unspecified result.
(bag-fold-unique
proc
nil
bag)
Invokes proc on each unique element of bag in arbitrary order, passing the number of occurrences as a second argument and the result of the previous invocation as a third argument. For the first invocation, nil is used as the third argument. Returns the result of the last invocation.
(bag-increment!
bag
element
n)
(bag-decrement!
bag
element
n)
Returns a bag containing the elements of bag, but with n more or fewer copies of element.
(bag->set
bag)
(set->bag
set)
The bag->set
procedure returns a new set containing the unique elements of bag. The set->bag
procedure returns a new bag containing the elements of set. The comparator or equality predicate of the result is the same as the comparator or equality predicate of the argument.
The elements of a character set are characters. Except as noted below, the procedures for creating and manipulating character sets are the same as those for sets, except that set
is replaced by char-set
in their names, and there are no comparator arguments, as the equality predicate is always char=?
.
There is no char-set-intern!
procedure.
The character sets of this SRFI belong to the same type as the character sets of SRFI 14, at least in implementations that support both SRFIs.
(char-set
element ...)
Returns a character set which contains the elements.
(make-universal-char-set)
Returns a character set containing all possible elements.
(char-set-complement
char-set)
Returns a new character set that is the complement of char-set.
(char-set-complement!
char-set)
Returns a character set that is the complement of char-set, possibly mutating char-set in the process.
(char-set-min
char-set)
(char-set-max
char-set)
Returns the smallest or largest character in char-set, or #f
if there is none.
The elements of an integer set are non-negative exact integers less than the set's limit, which is specified when it is created. Except as noted below, the procedures for creating and manipulating integer sets are the same as those for sets, except that set
is replaced by integer-set
in their names, and references to comparators are replaced by limits, as the equality predicate is always =
. Wherever an integer set is returned, it has the same limit as the source sets. It is an error for a single procedure to be invoked on integer sets with different limits.
There is no integer-set-intern!
procedure.
(integer-set
limit element ...)
Returns a new integer set. The possible elements of the set are the exact integers from 0 to limit - 1, where limit is an exact non-negative integer. The set contains the elements.
(make-universal-integer-set
limit)
Returns a new integer set. The possible elements of the set are the exact integers from 0 to limit - 1, where limit is an exact non-negative integer. The set contains all possible elements.
(integer-set-complement
integer-set)
Returns a new integer set that is the complement of integer-set.
(integer-set-complement!
integer-set)
Returns a character set that is the complement of char-set, possibly mutating char-set in the process.
(integer-set-min
integer-set)
(integer-set-max
integer-set)
Returns the smallest or largest integer in integer-set, or #f
if there is none.
(integer-set->integer
integer-set)
(integer->integer-set
integer limit)
The integer-set->integer
procedure returns a non-negative exact integer which encodes integer-set. When considered as a binary number, the ones bit (least significant bit) of the integer encodes the presence of 0 in the set, the twos bit of the integer encodes the presence of 1, and so on up to the most significant bit, which encodes the presence of limit - 1. The integer->integer-set
returns an integer set whose limit is limit after performing the reverse transformation on integer (which must not be negative).
Except as noted below, the procedures for creating and manipulating enumeration sets are the same as those for sets, except that set
is replaced by enum-set
in their names. Wherever a new enumeration set is returned, it has the same enumeration type as the source sets. It is an error for a single procedure to be invoked on enumeration sets with different enumeration types.
(define-enumeration-type
<type-name> (
<constructor> <symbol> ...)
<predicate>)
Creates an enumeration type whose members are <symbol>s. The identifier <type-name> is bound to the enumeration type. The identifier <constructor> is bound to a procedure similar to enum-set
, except that it does not accept an enum-type argument. Finally, the identifier <predicate> is bound to a predicate which is true if its argument is an enumeration set whose type is the declared type, and false otherwise. The syntax is reminiscent of define-record-type
.
There is no enum-set-intern!
procedure.
(make-enum-type
enum-list)
Returns an enumeration type suitable for constructing enumeration sets whose members are symbols. The elements of enum-list are either symbols or else pairs where the car is a symbol and the cdr is an exact integer called the value of the symbol. If the element is a symbol, the value is greater by 1 than the value of the previous element, or 0 if there is no previous element. Values are normally distinct, but make-enum-type
intentionally does not enforce this, for compatibility with C/C++ external routines. The symbols are said to be in the enumeration type. In R6RS the function of this procedure is provided as part of make-enumeration
, but the values (called indexes) are assigned from 0 on up.
(enum-type?
obj)
Returns #t
if obj is an enumeration type, and #f
otherwise.
(enum-type->alist
enum-type)
Return a newly allocated alist mapping the symbols in enum-type to their values.
(enum-type-symbol-value
enum-type symbol)
Return the value of symbol in enum-type, or #f
if it is not in enum-type. A loose R6RS analogue is the procedure returned by enum-type-indexer
when applied to an enum-set belonging to enum-type
.
(enum-type-symbol
enum-type value)
Returns the first symbol whose value in enum-type is value, or #f
if there is none.
(enum-value=?
enum-type symbol ...)
(enum-value<?
enum-type symbol ...)
(enum-value>?
enum-type symbol ...)
(enum-value<=?
enum-type symbol ...)
(enum-value>=?
enum-type symbol ...)
Returns #t
if the values of the symbols are equal (which implies that the symbols themselves are eq?
), monotonically increasing, monotonically decreasing, monotonically nondecreasing, and monotonically nonincreasing respectively, and #f
otherwise.
(enum-set
enum-type symbol ... )
Returns a new enumeration set. The possible elements of the set are the symbols in enum-type. The set is initialized to contain the symbols. The approximate R6RS equivalents are enum-set-constructor
and make-enumeration
.
(make-universal-enum-set
enum-type)
Returns a new enumeration set. The possible elements of the set are the symbols in enum-type. The set contains all possible elements. The approximate R6RS equivalent is enum-set-universe
.
(list->enum-set
enum-type
list)
Returns a new enumeration set. The possible elements of the set are the symbols in enum-type. The set is initialized to contain the elements of list. There is no R6RS equivalent.
(enum-set-complement
enum-set)
Returns a new enumeration set that is the complement of enum-set. This procedure is also in R6RS.
(enum-set-complement!
enum-set)
Returns an enumeration set that is the complement of enum-set, possibly mutating enum-set in the process. This procedure is also in R6RS, where it is required to mutate its argument.
(enum-set-min
integer-set)
(enum-set-max
integer-set)
Returns the smallest or largest symbol in symbol index order in enum-set, or #f
if there is none.
(enum-set-projection
enum-set
enum-type)
Returns a new enumeration set of type enum-type. Its elements are the symbols belonging to enum-set, ignoring any symbols which are not in enum-type. This procedure is also in R6RS, but uses a second enum-set in place of enum-type.
Sets and bags are implemented as a thin veneer over hashtables, and integer sets over bytevectors. Character sets just are SRFI 14 character sets. For clarity, the integer set implementation stores just one value into each bytevector element rather than the eight that would be possible. In turn, enumeration types are implemented over a hashtable from symbols to indexes, and enumeration sets over integer sets.
The sample implementation contains the following files:
sets-impl.scm
- implementation of setsbags-impl.scm
- implementation of bagsisets-impl.scm
- implementation of integer setscsets-impl.scm
- implementation of character setsenums-impl.scm
- implementation of enums and enum typescount.scm
- macros for conveniently looping over bytevectorssrfi-4-shim.scm
- a trivial implementation of bytevectors on top of SRFI 4 u8vectorssets.sld
- an R7RS librarysets.scm
- a Chicken library Each *-impl
file is divided into a section that makes use of the underlying collection and a section that doesn't.
The file sets-test.scm
is a test suite for this library. It uses Chicken's test
egg, which is also provided on Chibi as an R7RS library named (chibi test)
.
Copyright (C) John Cowan 2013. All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.