Title

Sets and bags

Author

John Cowan

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. To provide input on this SRFI, please mail to <srfi minus 113 at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Abstract

Sets and bags (also known as multisets) are unordered collections that can contain any Scheme object. Sets enforce the constraint that no two elements can be the same in the sense of the set's associated equality predicate; bags do not.

Rationale

Sets are a standard part of the libraries of many high-level programming languages, including Smalltalk, Java, and C++. Racket provides general sets similar to those of this proposal, though with fewer procedures, and there is a Chicken egg called sets (unfortunately undocumented), which provides a minimal set of procedures. SRFI 1 also provides a list-based implementation of sets.

Bags are useful for counting anything from a fixed set of possibilities, e.g. the number of each type of error in a log file or the number of uses of each word in a lexicon drawn from a body of documents. Although other data structures can serve the same purpose, using bags clearly expresses the programmer's intent and allows for optimization.

Insofar as possible, the names in this SRFI are harmonized with the names used for ordered collections (lists, strings, vectors, and bytevectors) in Scheme. However, size is used instead of length to express the number of elements in a collection, because length implies order.

It's possible to use the general sets of this SRFI to contain characters, but the use of SRFI 14 is recommended instead. The names and facilities in this SRFI are harmonized with SRFI 14, except that SRFI 14 does not contain analogues of the set-search!, set>?, set<=?, set>=?, set-remove, or set-partition procedures.

Sets and bags do not have a lexical syntax representation. It's possible to use SRFI 108 quasi-literal constructors to create them in code, but this SRFI does not standardize how that is done.

The interface to general sets and bags depends on SRFI 114 comparators, despite that SRFI having a higher number than this one for hysterical raisins. Comparators conveniently package the equality predicate of the set with the hash function or comparison procedure needed to implement the set efficiently.

Specification

Sets and bags are mutually disjoint, and disjoint from other types of Scheme objects.

It is an error for any procedure defined in this SRFI to be invoked on sets or bags with distinct comparators (in the sense of eq?).

It is an error to mutate any object while it is contained in a set or bag.

It is an error to add an object to a set or bag which does not satisfy the type test predicate of the comparator.

It is an error to add or remove an object for a set or a bag while iterating over it.

Linear update

The procedures of this SRFI, by default, are "pure functional" — they do not alter their parameters. However, this SRFI also defines "linear-update" procedures, all of whose names end in !. They have hybrid pure-functional/side-effecting semantics: they are allowed, but not required, to side-effect one of their parameters in order to construct their result. An implementation may legally implement these procedures as pure, side-effect-free functions, or it may implement them using side effects, depending upon the details of what is the most efficient or simple to implement in terms of the underlying representation.

It is an error to rely upon these procedures working by side effect. For example, this is not guaranteed to work:

        (let* ((set1 (set 'a 'b 'c))      ; set1 = {a,b,c}.
               (set2 (set-adjoin! 'd)))   ; Add d to {a,b,c}.
          set1) ; Could be either {a,b,c} or {a,b,c,d}.

However, this is well-defined:

        (let ((set1 (set 'a 'b 'c)))
          (set-adjoin! set1 'd)) ; Add d to {a,b,c}.

So clients of these procedures write in a functional style, but must additionally be sure that, when the procedure is called, there are no other live pointers to the potentially-modified character set (hence the term "linear update").

There are two benefits to this convention:

In practice, these procedures are most useful for efficiently constructing sets and bags in a side-effecting manner, in some limited local context, before passing the character set outside the local construction scope to be used in a functional manner.

Scheme provides no assistance in checking the linearity of the potentially side-effected parameters passed to these functions — there's no linear type checker or run-time mechanism for detecting violations.

Note that if an implementation uses no side effects at all, it is allowed to return existing sets and bags rather than newly allocated ones, even where this SRFI explicitly says otherwise.

Comparator restrictions

Implementations of this SRFI are allowed to place restrictions on the comparators that the procedures accept. In particular, an implementation may require comparators to provide a comparison procedure. Alternatively, an implementation may require comparators to provide a hash function, unless the equality predicate of the comparator is eq?, eqv?, equal?, string=?, or string-ci=?. Implementations must not require the provision of both a comparison procedure and a hash function.

Index

Set procedures

Constructors

(set comparator element ... )

Returns a newly allocated empty set. The comparator argument is a SRFI 114 comparator, which is used to control and distinguish the elements of the set. The elements are used to initialize the set.

(set-unfold comparator stop? mapper successor seed)

Create a newly allocated set as if by set using comparator. If the result of applying the predicate stop? to seed is true, return the set. Otherwise, apply the procedure mapper to seed. The value that mapper returns is added to the set. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm.

Predicates

(set? obj)

Returns #t if obj is a set, and #f otherwise.

(set-contains? set element)

Returns #t if element is a member of set and #f otherwise.

(set-empty? set)

Returns #t if set has no elements and #f otherwise.

(set-disjoint? set1 set2)

Returns #t if set1 and set2 have no elements in common and #f otherwise.

Accessors

(set-member set element default)

Returns the element of set that is equal, in the sense of set's equality predicate, to element. If element is not a member of set, default is returned.

(set-element-comparator set)

Returns the comparator used to compare the elements of set.

Updaters

(set-adjoin set element ...)

(set-adjoin! set element ...)

The set-adjoin procedure returns a newly allocated set that uses the same comparator as set and contains all the values of set, and in addition each element unless it is already equal (in the sense of the comparator) to one of the existing or newly added members. It is an error to add an element to set that does not return #t when passed to the type test procedure of the comparator.

(set-replace set element)

(set-replace! set element)

The set-replace procedure returns a newly allocated set that uses the same comparator as set and contains all the values of set except as follows: If element is equal (in the sense of set's comparator) to an existing member of set, then that member is omitted and replaced by element. If there is no such element in set, then set is returned unchanged.

The set-adjoin! and set-replace! procedures are the same as set-adjoin and set-replace, except that they are permitted to mutate and return the set argument rather than allocating a new set.

(set-delete set element ...)

(set-delete! set element ...)

(set-delete-all set element-list)

(set-delete-all! set element-list)

The set-delete procedure returns a newly allocated set containing all the values of set except for any that are equal (in the sense of set's comparator) to one or more of the elements. Any element that is not equal to some member of the set is ignored.

The set-delete! procedure is the same as set-delete, except that it is permitted to mutate and return the set argument rather than allocating a new set.

The set-delete-all and set-delete-all! procedures are the same as set-delete and set-delete!, except that they accept a single argument which is a list of elements to be deleted.

(set-search! set element failure success)

The set is searched for element. If it is not found, then the failure procedure is tail-called with two continuation arguments, insert and ignore, and is expected to tail-call one of them. If element is found, then the success procedure is tail-called with the matching element of set and two continuations, update and remove, and is expected to tail-call one of them.

The effects of the continuations are as follows (where obj is any Scheme object):

In all cases, two values are returned: the possibly updated set and obj.

The whole set

(set-size set)

Returns the number of elements in set as an exact integer.

(set-find predicate set failure)

Returns an arbitrarily chosen element of set that satisfies predicate, or the result of invoking failure with no arguments if there is none.

(set-count predicate set)

Returns the number of elements of set that satisfy predicate as an exact integer.

(set-any? predicate set)

Returns #t if any element of set satisfies predicate, or #f otherwise. Note that this differs from the SRFI 1 analogue because it does not return an element of the set.

(set-every? predicate set)

Returns #t if every element of set satisfies predicate, or #f otherwise. Note that this differs from the SRFI 1 analogue because it does not return an element of the set.

Mapping and folding

(set-map comparator proc set)

Applies proc to each element of set in arbitrary order and returns a newly allocated set, created as if by (set comparator), which contains the results of the applications. For example:

        (set-map string-ci-comparator symbol->string (set eq? 'foo 'bar 'baz))
            => (set string-ci-comparator "foo" "bar" "baz")

Note that, when proc defines a mapping that is not 1:1, some of the mapped objects may be equivalent in the sense of comparator's equality predicate, and in this case duplicate elements are omitted as in the set constructor. For example:

(set-map (lambda (x) (quotient x 2))
         integer-comparator
         (set integer-comparator 1 2 3 4 5))
 => (set integer-comparator 0 1 2)

If the elements are the same in the sense of eqv?, it is unpredictable which one will be preserved in the result.

(set-for-each proc set)

Applies proc to set in arbitrary order, discarding the returned values. Returns an unspecified result.

(set-fold proc nil set)

Invokes proc on each member of set in arbitrary order, passing the result of the previous invocation as a second argument. For the first invocation, nil is used as the second argument. Returns the result of the last invocation, or nil if there was no invocation.

(set-filter predicate set)

Returns a newly allocated set with the same comparator as set, containing just the elements of set that satisfy predicate.

(set-filter! predicate set)

A linear update procedure that returns a set containing just the elements of set that satisfy predicate.

(set-remove predicate set)

Returns a newly allocated set with the same comparator as set, containing just the elements of set that do not satisfy predicate.

(set-remove! predicate set)

A linear update procedure that returns a set containing just the elements of set that do not satisfy predicate.

(set-partition predicate set)

Returns two values: a newly allocated set with the same comparator as set that contains just the elements of set that satisfy predicate, and another newly allocated set, also with the same comparator, that contains just the elements of set that do not satisfy predicate.

(set-partition! predicate set)

A linear update procedure that returns two sets containing the elements of set that do and do not, respectively, not satisfy predicate.

Copying and conversion

(set-copy set)

Returns a newly allocated set containing the elements of set, and using the same comparator.

(set->list set)

Returns a newly allocated list containing the members of set in unspecified order.

(list->set comparator list)

Returns a newly allocated set, created as if by set using comparator, that contains the elements of list. Duplicate elements (in the sense of the equality predicate) are omitted.

(list->set! set list)

Returns a set that contains the elements of both set and list. Duplicate elements (in the sense of the equality predicate) are omitted.

Subsets

Note: The following three predicates do not obey the trichotomy law and therefore do not constitute a total order on sets.

(set=? set1 set2 ...)

Returns #t if each set contains the same elements.

(set<? set1 set2 ...)

Returns #t if each set other than the last is a proper subset of the following set, and #f otherwise.

(set>? set1 set2 ...)

Returns #t if each set other than the last is a proper superset of the following set, and #f otherwise.

(set<=? set1 set2 ...)

Returns #t if each set other than the last is a subset of the following set, and #f otherwise.

(set>=? set1 set2 ...)

Returns #t if each set other than the last is a superset of the following set, and #f otherwise.

Set theory operations

(set-union set1 set2 ...)

(set-intersection set1 set2 ...)

(set-difference set1 set2 ...)

(set-xor set1 set2)

Return a newly allocated set that is the union, intersection, asymmetric difference, or symmetric difference of the sets. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.

(set-union! set1 set2 ...)

(set-intersection! set1 set2 ...)

(set-difference! set1 set2 ...)

(set-xor! set1 set2)

Linear update procedures returning a set that is the union, intersection, asymmetric difference, or symmetric difference of the sets. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.

(set-cartesian-product set1 set2 ... )

Return the cartesian product of the sets. The result is a set of lists constituting every possible combination of elements of the arguments, where the first element of each list is drawn from set1, the second element from set2, and so on. Note that the lists may share storage, so mutating them is an error even when they are removed from the result set. To get a set of sets instead, use set-map with the comparator set-comparator and the mapping procedure list->set. Note that there is no linear-update version of this procedure.

(set-power-set set)

Return the power set of set. The result is a set of sets constituting every possible combination of the elements of set, from the null set to the set of all elements. The running time of this procedure is O(2n), so beware of applying it to large sets. Note that there is no linear-update version of this procedure.

Bag procedures

Bags are like sets, but can contain the same object more than once. However, if two elements that are the same in the sense of the equality predicate, but not in the sense of eqv?, are both included, it is not guaranteed that they will remain distinct when retrieved from the bag. It is an error for a single procedure to be invoked on bags with different comparators.

The procedures for creating and manipulating bags are the same as those for sets, except that set is replaced by bag in their names, and that adjoining an element to a bag is effective even if the bag already contains the element. (The bag version of set-power-set is bag-power-bag.) If two elements in a bag are the same in the sense of the bag's comparator, the implementation may in fact store just one of them.

The bag-union, bag-intersection, bag-difference, and bag-xor procedures (and their linear update analogues) behave as follows when both bags contain elements that are equal in the sense of the bags' comparator:

Additional bag procedures

(bag-sum set1 set2 ... )

(bag-sum! bag1 bag2 ... )

The bag-sum procedure returns a newly allocated bag containing all the unique elements in all the bags, such that the count of each unique element in the result is equal to the sum of the counts of that element in the arguments. It differs from bag-union by treating identical elements as potentially distinct rather than attempting to match them up.

The bag-sum! procedure is equivalent except that it is linear-update.

(bag-product n bag)

(bag-product! n bag)

The bag-product procedure returns a newly allocated bag containing all the unique elements in bag, where the count of each unique element in the bag is equal to the count of that element in bag multiplied by n.

The bag-product! procedure is equivalent except that it is linear-update.

(bag-unique-size bag)

Returns the number of unique elements of bag.

(bag-element-count bag element)

Returns an exact integer representing the number of times that element appears in bag.

(bag-for-each-unique proc bag)

Applies proc to each unique element of bag in arbitrary order, passing the element and the number of times it occurs in bag, and discarding the returned values. Returns an unspecified result.

(bag-fold-unique proc nil bag)

Invokes proc on each unique element of bag in arbitrary order, passing the number of occurrences as a second argument and the result of the previous invocation as a third argument. For the first invocation, nil is used as the third argument. Returns the result of the last invocation.

(bag-increment! bag element count)

(bag-decrement! bag element count)

Linear update procedures that return a bag with the same elements as bag, but with the element count of element in bag increased or decreased by the exact integer count (but not less than zero).

(bag->set bag)

(set->bag set)

(set->bag! bag set)

The bag->set procedure returns a newly allocated set containing the unique elements (in the sense of the equality predicate) of bag. The set->bag procedure returns a newly allocated bag containing the elements of set. The set->bag! procedure returns a bag containing the elements of both bag and set. In all cases, the comparator of the result is the same as the comparator of the argument or arguments.

(bag->alist bag)

(alist->bag comparator alist)

The bag->alist procedure returns a newly allocated alist whose keys are the unique elements of bag and whose values are the number of occurrences of each element. The alist->bag returning a newly allocated bag based on comparator, where the keys of alist specify the elements and the corresponding values of alist specify how many times they occur.

Comparators

The following comparators are used to compare sets or bags, and allow sets of sets, bags of sets, etc.

set-comparator

bag-comparator

Note that these comparators do not provide comparison procedures, as there is no ordering between sets or bags. It is an error to compare sets or bags with different element comparators.

Implementation

The implementation places the identifiers defined above into the sets library.

Sets and bags are implemented as a thin veneer over hashtables.

The implementation registers set-comparator and bag-comparator with SRFI 114's default comparator, assuming the sample implementation of SRFI 114 is being used. Scheme implementers who provide their own implementations of SRFI 114 must change this part of the code.

The sample implementation contains the following files:

The test suite will work with the Chicken test egg, which is provided on Chibi as the (chibi test) library.

Copyright

Copyright (C) John Cowan 2013. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Mike Sperber