217: Integer Sets

by John Cowan (text), Wolfgang Corcoran-Mathe (implementation)

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-217@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2020-11-14
Draft #1 published: 2020-11-29
Draft #2 published: 2021-01-30
Draft #3 published: 2021-02-03
Draft #4 published: 2021-02-08
Finalized: 2021-02-15
Revised to fix errata:
- 2021-09-14 (Not all results of iset-search are newly allocated.)

Abstract

Integer sets, or isets, are unordered collections of fixnums. (Fixnums are exact integers within certain implementation-specified bounds.)

Rationale

While it is perfectly practical to store integers in SRFI-113 sets, other algorithms can be used to represent sets of exact integers. This SRFI is almost a drop-in replacement for SRFI 113, except that set is replaced by iset in procedure names. However, comparators are not useful for integer sets, and in iset, iset-unfold, iset-map, and iset-copy, the comparator argument is omitted.

In addition, since exact integers are inherently ordered, this SRFI provides a number of procedures which have no direct equivalents in SRFI 113. These include:

make-range-iset, which creates an iset given a range of integers
iset-min and iset-max, which return the largest and smallest members
the interval procedures, which return subsets from an existing set that are within a given interval
the isubset procedures, which return subsets from an existing set that are greater than, less than, or equal to a given exact integer

Integer maps are naturally related to isets, and may be provided in a future SRFI.

Specification

Isets are disjoint from other types of Scheme objects.

It is an error to add or remove an object for an iset while iterating over it.

Linear update

The procedures of this SRFI, by default, are "pure functional" — they do not alter their parameters. However, this SRFI also defines "linear-update" procedures, all of whose names end in !. They have hybrid pure-functional/side-effecting semantics: they are allowed, but not required, to side-effect one of their parameters in order to construct their result. An implementation may legally implement these procedures as pure, side-effect-free functions, or it may implement them using side effects, depending upon the details of what is the most efficient or simple to implement in terms of the underlying representation.

It is an error to rely upon these procedures working by side effect. For example, this is not guaranteed to work:

        (let* ((iset1 (iset 1 2 3))      ; iset1 = {1,2,3}.
               (iset2 (iset-adjoin! iset1 4)))   ; Add 4 to {1,2,3}.
          iset1) ; Could be either {1,2,3} or {1,2,3,4}.

However, this is well-defined:

        (let ((iset1 (iset 1 2 3)))
          (iset-adjoin! iset1 4)) ; Add 4 to {1,2,3}.

So clients of these procedures write in a functional style, but must additionally be sure that, when the procedure is called, there are no other live pointers to the potentially-modified iset (hence the term "linear update").

There are two benefits to this convention:

Implementations are free to provide the most efficient possible implementation, either functional or side-effecting.
Programmers may nonetheless continue to assume that isets are purely functional data structures: they may be reliably shared without needing to be copied, uniquified, and so forth.

In practice, these procedures are most useful for efficiently constructing isets in a side-effecting manner, in some limited local context, before passing the iset outside the local construction scope to be used in a functional manner.

Scheme provides no assistance in checking the linearity of the potentially side-effected parameters passed to these functions — there's no linear type checker or run-time mechanism for detecting violations.

Note that if an implementation uses no side effects at all, it is allowed to return existing isets rather than newly allocated ones, even where this SRFI explicitly says otherwise.

Index

Constructors: iset, iset-unfold, make-range-iset
Predicates: iset?, iset-contains?, iset-empty?, iset-disjoint?
Accessors: iset-member, iset-min, iset-max
Updaters: iset-adjoin, iset-adjoin!, iset-delete, iset-delete!, iset-delete-all, iset-delete-all!, iset-search, iset-search!, iset-delete-min, iset-delete-min!, iset-delete-max, iset-delete-max!
The whole iset: iset-size, iset-find, iset-count, iset-any?, iset-every?
Mapping and folding: iset-map, iset-for-each, iset-fold, iset-fold-right, iset-filter, iset-filter!, iset-remove, iset-remove!, iset-partition, iset-partition!
Copying and conversion: iset-copy, iset->list, list->iset, list->iset!
Subsets: iset=?, iset<?, iset>?, iset<=?, iset>=?
Set theory operations: iset-union, iset-intersection, iset-difference, iset-xor, iset-union!, iset-intersection!, iset-difference!, iset-xor!
Intervals and ranges: iset-open-interval, iset-closed-interval, iset-open-closed-interval, iset-closed-open-interval, isubset=, isubset<, isubset<=, isubset>, isubset>=

Constructors

(iset element ... )

Returns a newly allocated iset. The elements are used to initialize the iset.

(iset->list (iset 2 3 5 7 11)) ⇒ (2 3 5 7 11)
(iset->list (iset)) ⇒ ()

(iset-unfold stop? mapper successor seed)

Create a newly allocated iset as if by iset. If the result of applying the predicate stop? to seed is true, return the iset. Otherwise, apply the procedure mapper to seed. The value that mapper returns is added to the iset. Then get a new seed by applying the procedure successor to seed, and repeat this algorithm.

(iset->list (iset-unfold (lambda (n) (> n 64))
                         values
                         (lambda (n) (* n 2))
                         2))
 ⇒ (2 4 8 16 32 64)

(make-range-iset start end [step])

Returns a newly allocated iset specified by an inclusive lower bound start, an exclusive upper bound end, and a step value (default 1), all of which are exact integers. This constructor produces an iset containing the sequence

start, (+ start step), (+ start (* 2 step)), …, (+ start (* n step)),

where n is the greatest integer such that (+ start (* n step)) < end if step is positive, or such that (+ start (* n step)) > end if step is negative. It is an error if step is zero.

(iset->list (make-range-iset 25 30)) ⇒ (25 26 27 28 29)
(iset->list (make-range-iset -10 10 6)) ⇒ (-10 -4 2 8)

Predicates

(iset? obj)

Returns #t if obj is a iset, and #f otherwise.

(iset-contains? iset element)

Returns #t if element is a member of iset and #f otherwise.

(iset-contains? (iset 2 3 5 7 11) 5) ⇒ #t
(iset-contains? (iset 2 3 5 7 11) 4) ⇒ #f

(iset-empty? iset)

Returns #t if iset has no elements and #f otherwise.

(iset-empty? (iset 2 3 5 7 11)) ⇒ #f
(iset-empty? (iset)) ⇒ #t

(iset-disjoint? iset₁ iset₂)

Returns #t if iset₁ and iset₂ have no elements in common and #f otherwise.

(iset-disjoint? (iset 1 3 5) (iset 0 2 4)) ⇒ #t
(iset-disjoint? (iset 1 3 5) (iset 2 3 4)) ⇒ #f

Accessors

(iset-member iset element default)

Returns the element of iset that is equal to element. If element is not a member of iset, default is returned.

(iset-member (iset 2 3 5 7 11) 7 #f) ⇒ 7
(iset-member (iset 2 3 5 7 11) 4 'failure) ⇒ failure

(iset-min iset)
(iset-max iset)

Returns the smallest or largest integer in iset, or #f if there is none.

(iset-min (iset 2 3 5 7 11)) ⇒ 2
(iset-max (iset 2 3 5 7 11)) ⇒ 11
(iset-max (iset)) ⇒ #f

Updaters

(iset-adjoin iset element₁ element₂ ...)

The iset-adjoin procedure returns a newly allocated iset that contains all the values of iset, and in addition each element unless it is already equal to one of the existing or newly added members.

(iset->list (iset-adjoin (iset 1 3 5) 0)) ⇒ (0 1 3 5)

(iset-adjoin! iset element₁ element₂ ...)

The iset-adjoin! procedure is the same as iset-adjoin, except that it is permitted to mutate and return the iset argument rather than allocating a new iset.

(iset-delete iset element₁ element₂ ...)

(iset-delete! iset element₁ element₂ ...)

(iset-delete-all iset element-list)

(iset-delete-all! iset element-list)

The iset-delete procedure returns a newly allocated iset containing all the values of iset except for any that are equal to one or more of the elements. Any element that is not equal to some member of the iset is ignored.

The iset-delete! procedure is the same as iset-delete, except that it is permitted to mutate and return the iset argument rather than allocating a new iset.

The iset-delete-all and iset-delete-all! procedures are the same as iset-delete and iset-delete!, except that they accept a single argument which is a list of elements to be deleted.

(iset->list (iset-delete (iset 1 3 5) 3)) ⇒ (1 5)
(iset->list (iset-delete-all (iset 2 3 5 7 11) '(3 4 5))) ⇒ (2 7 11)

(iset-delete-min iset)

(iset-delete-min! iset)

(iset-delete-max iset)

(iset-delete-max! iset)

Returns two values: the smallest/largest integer n in iset and a newly-allocated iset that contains all elements of iset except for n. It is an error if iset is empty.

The iset-delete-min! and iset-delete-max! procedures are the same as iset-delete-min and iset-delete-max, respectively, except that they are permitted to mutate and return the iset argument instead of allocating a new iset.

(let-values (((n set) (iset-delete-min (iset 2 3 5 7 11))))
  (list n (iset->list set)))
  ⇒ (2 (3 5 7 11))
(let-values (((n set) (iset-delete-max (iset 2 3 5 7 11))))
  (list n (iset->list set)))
  ⇒ (11 (2 3 5 7))

(iset-search iset element failure success)

The iset is searched from lowest to highest value for element. If it is not found, then the failure procedure is tail-called with two continuation arguments, insert and ignore, and is expected to tail-call one of them. If element is found, then the success procedure is tail-called with the matching element of iset and two continuations, update and remove, and is expected to tail-call one of them.

The effects of the continuations are as follows (where obj is any Scheme object):

Invoking (insert obj) causes element to be inserted into iset.
Invoking (ignore obj) causes iset to remain unchanged.
Invoking (update new-element obj) causes new-element to be inserted into iset in place of element.
Invoking (remove obj) causes the matching element of iset to be removed from it.

In all cases, two values are returned: an iset and obj.

(iset-search! iset element failure success)

The iset-search! procedure is the same as iset-search, except that it is permitted to mutate and return the iset argument rather than allocating a new iset.

The whole iset

(iset-size iset)

Returns the number of elements in iset as an exact integer.

(iset-size (iset 1 3 5)) ⇒ 3

(iset-find predicate iset failure)

Returns the smallest element of iset that satisfies predicate, or the result of invoking failure with no arguments if there is none.

(iset-find positive? (iset -1 1) (lambda () #f)) ⇒ 1
(iset-find zero? (iset -1 1) (lambda () #f)) ⇒ #f

(iset-count predicate iset)

Returns the number of elements of iset that satisfy predicate as an exact integer.

(iset-count positive? (iset -2 -1 1 2)) ⇒ 2

(iset-any? predicate iset)

Returns #t if any element of iset satisfies predicate, or #f otherwise. Note that this differs from the SRFI 1 analogue because it does not return an element of the iset.

(iset-any? positive? (iset -2 -1 1 2)) ⇒ #t
(iset-any? zero? (iset -2 -1 1 2)) ⇒ #f

(iset-every? predicate iset)

Returns #t if every element of iset satisfies predicate, or #f otherwise. Note that this differs from the SRFI 1 analogue because it does not return an element of the iset.

(iset-every? (lambda (x) (< x 5)) (iset -2 -1 1 2)) ⇒ #t
(iset-every? positive? (iset -2 -1 1 2)) ⇒ #f

Mapping and folding

(iset-map proc iset)

Applies proc to each element of iset in arbitrary order and returns a newly allocated iset, created as if by iset, which contains the results of the applications. It is an error if proc returns a value that is not an exact integer.

(iset-map (lambda (x) (* 10 x)) (iset 1 11 21))
     => (iset 10 110 210)

(iset-map (lambda (x) (quotient x 2))
         (iset 1 2 3 4 5))
 => (iset 0 1 2)

(iset-for-each proc iset)

Applies proc to iset in increasing numerical order, discarding the returned values. Returns an unspecified result.

(let ((sum 0))
  (iset-for-each (lambda (x) (set! sum (+ sum x)))
                 (iset 2 3 5 7 11))
  sum)
 ⇒ 28

(iset-fold proc nil iset)
(iset-fold-right proc nil iset)

Invokes proc on each member of iset in increasing/decreasing numerical order, passing the result of the previous invocation as a second argument. For the first invocation, nil is used as the second argument. Returns the result of the last invocation, or nil if there was no invocation.

(iset-fold + 0 (iset 2 3 5 7 11)) ⇒ 28
(iset-fold cons '() (iset 2 3 5 7 11)) ⇒ (11 7 5 3 2)
(iset-fold-right cons '() (iset 2 3 5 7 11)) ⇒ (2 3 5 7 11)

(iset-filter predicate iset)

Returns a newly allocated iset containing just the elements of iset that satisfy predicate.

(iset->list (iset-filter (lambda (x) (< x 6)) (iset 2 3 5 7 11)))
 ⇒ (2 3 5)

(iset-filter! predicate iset)

A linear update procedure that returns a iset containing just the elements of iset that satisfy predicate.

(iset-remove predicate iset)

Returns a newly allocated iset containing just the elements of iset that do not satisfy predicate.

(iset->list (iset-remove (lambda (x) (< x 6)) (iset 2 3 5 7 11)))
 ⇒ (7 11)

(iset-remove! predicate iset)

A linear update procedure that returns a iset containing just the elements of iset that do not satisfy predicate.

(iset-partition predicate iset)

Returns two values: a newly allocated iset that contains just the elements of iset that satisfy predicate and another newly allocated iset that contains just the elements of iset that do not satisfy predicate.

(let-values (((low high) (iset-partition (lambda (x) (< x 6))
                                         (iset 2 3 5 7 11))))
  (list (iset->list low) (iset->list high)))
 ⇒ ((2 3 5) (7 11))

(iset-partition! predicate iset)

A linear update procedure that returns two isets containing the elements of iset that do and do not, respectively, not satisfy predicate.

Copying and conversion

(iset-copy iset)

Returns a newly allocated iset containing the elements of iset.

(iset->list iset)

Returns a newly allocated list containing the members of iset in increasing numerical order.

(iset->list (iset 2 3 5 7 11)) ⇒ (2 3 5 7 11)

(list->iset list)

Returns a newly allocated iset, created as if by iset, that contains the elements of list. Duplicate elements are omitted.

(list->iset '(-3 -1 0 2)) = (iset -3 -1 0 2)

(list->iset! iset list)

Returns a iset that contains the elements of both iset and list. Duplicate elements are omitted. list->iset! may mutate iset rather than allocating a new iset.

(iset->list (list->iset! (iset 2 3 5) '(-3 -1 0))) ⇒ (-3 -1 0 2 3 5)

Subsets

Note: None of these predicates produces a total order on isets. In particular, iset=?, iset<?, and iset>? do not obey the trichotomy law.

(iset=? iset₁ iset₂ iset₃ ...)

Returns #t if each iset contains the same elements.

(iset<? iset₁ iset₂ iset₃ ...)

Returns #t if each iset other than the last is a proper subset of the following iset, and #f otherwise.

(iset>? iset₁ iset₂ iset₃ ...)

Returns #t if each iset other than the last is a proper superset of the following iset, and #f otherwise.

(iset<=? iset₁ iset₂ iset₃ ...)

Returns #t if each iset other than the last is a subset of the following iset, and #f otherwise.

(iset>=? iset₁ iset₂ iset₃ ...)

Returns #t if each iset other than the last is a superset of the following iset, and #f otherwise.

Examples:

(iset=? (iset 1 2 3) (iset 3 1 2)) ⇒ #t
(iset<? (iset 3 1 2) (iset 4 2 1 3)) ⇒ #t
(iset>=? (iset 3 0 1) (iset 0 1) (iset 0 1)) ⇒ #t

Set theory operations

(iset-union iset₁ iset₂ iset₃ ...)

(iset-intersection iset₁ iset₂ iset₃ ...)

(iset-difference iset₁ iset₂ iset₃ ...)

(iset-xor iset₁ iset₂)

Return a newly allocated iset that is the union, intersection, asymmetric difference, or symmetric difference of the isets. Asymmetric difference is extended to more than two isets by taking the difference between the first iset and the union of the others. Symmetric difference is not extended beyond two isets. Elements in the result iset are drawn from the first iset in which they appear.

(iset->list (iset-union (iset 0 1 3) (iset 0 2 4))) ⇒ (0 1 2 3 4)
(iset->list (iset-intersection (iset 0 1 3 4) (iset 0 2 4))) ⇒ (0 4)
(iset->list (iset-difference (iset 0 1 3 4) (iset 0 2) (iset 0 4))) ⇒ (1 3)
(iset->list (iset-xor (iset 0 1 3) (iset 0 2 4))) ⇒ (1 2 3 4)

(iset-union! iset₁ iset₂ iset₃ ...)

(iset-intersection! iset₁ iset₂ iset₃ ...)

(iset-difference! iset₁ iset₂ iset₃ ...)

(iset-xor! iset₁ iset₂)

Linear update procedures returning an iset that is the union, intersection, asymmetric difference, or symmetric difference of the isets. Asymmetric difference is extended to more than two isets by taking the difference between the first iset and the union of the others. Symmetric difference is not extended beyond two isets. Elements in the result iset are drawn from the first iset in which they appear.

Intervals and ranges

(iset-open-interval iset low high)

(iset-closed-interval iset low high)

(iset-open-closed-interval iset low high)

(iset-closed-open-interval iset low high)

Procedures that return a subset of iset contained in the interval from low to high. The interval may be open, closed, open below and closed above, or open above and closed below.

(iset->list (iset-open-interval (iset 2 3 5 7 11) 2 7)) ⇒ (3 5)
(iset->list (iset-closed-interval (iset 2 3 5 7 11) 2 7)) ⇒ (2 3 5 7)
(iset->list (iset-open-closed-interval (iset 2 3 5 7 11) 2 7)) ⇒ (3 5 7)
(iset->list (iset-closed-open-interval (iset 2 3 5 7 11) 2 7)) ⇒ (2 3 5)

(isubset= iset k)

(isubset< iset k)

(isubset<= iset k)

(isubset> iset k)

(isubset>= iset k)

Procedures that return an integer set containing the elements of iset that are equal to, less than, less than or equal to, greater than, or greater than or equal to k. Note that the result of isubset= contains at most one element.

(iset->list (isubset= (iset 2 3 5 7 11) 7)) ⇒ (7)
(iset->list (isubset< (iset 2 3 5 7 11) 7)) ⇒ (2 3 5)
(iset->list (isubset>= (iset 2 3 5 7 11) 7)) ⇒ (7 11)

Implementation

The sample implementation is found in the repository of this SRFI.

The implementation is based on the Patricia tree approach described by Chris Okasaki and Andrew Gill (paper linked in the implementation README), which is also used by Haskell's IntMap library. It provides fast lookup and set-theoretical operations.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Arthur A. Gleckler