This draft SRFI includes an "Issues" section, whose contents should be resolved before this proposal is accepted as a final SRFI.
This SRFI proposes a coherent and comprehensive set of procedures for manipulating lists as sets; it is accompanied by a reference implementation of the spec. The reference implementation is
Be aware that these procedures are inherently O(n^2) in the lengths of their parameter lists - serious set operations on large lists should use alternate techniques.
It may well be the case that the 46 procedures in this library should be folded into the general list library I have submitted as a prior SRFI proposal. This should be resolved in the course of discussing these two SRFIs.
Should we provide both reliably-destructive and linear-update
(potentially destructive) procedure variants? It really imposes no
implementation overhead to do so - typically, one simply
defines the linear-update name to be either the destructive or purely
functional procedure. We would need a naming convention for
linear-update functions - perhaps a terminal +
, e.g.
UNION
UNION!
UNION+
Procedures are, generally speaking, allowed to disorder lists. This is in keeping with their interpretation as sets. (The initial list parameter of union and adjoin gets special dispensation.)
It was too messy to define variants of LSET<=
and
LSET=
that are specialised for the usual
EQ?
, EQV?
and EQUAL?
element-equality functions, so I didn't.
The procedures in this library have "pure" and "linear update" variants. A
"pure" procedure has no side-effects, and in particular does not alter its
arguments in any way. A "linear update" procedure is allowed - but not
required - to side-effect its arguments in order to construct its
result. "Linear update" procedures are typically given names ending with an
exclamation point. So, for example, if the elements of
LIST1
and LIST2
are
disjoint, then (UNION! LIST1 LIST2)
is allowed to construct its result by
simply using SET-CDR!
to set the cdr of the last pair of LIST1
to point to
LIST2
, and then returning LIST1
. However, UNION!
could also elect to perform a
pure append operation - this is a legal definition of UNION!:
(define union! union)
This is why we do not call these procedures "destructive" - because
they aren't required to be destructive. They are
potentially
destructive.
What this means is that you may only apply linear-update procedures to values that you know are "dead" - values that will never be used again in your program. This must be so, since you can't rely on the value passed to a linear-update procedure after that procedure has been called. It might be unchanged; it might be altered.
There are 46 procedures - 13 basic procedures, and another 33 which
are variants specialised to use the primitive element-equality
procedures EQ?
, EQV?
and
EQUAL?
.
lset<= = list1 list2 ...
lset= = list1 list2 ...
lset-adjoin = list elt1 ...
lset-union = list1 ...
lset-intersection = list1 list2 ...
lset-difference = list1 list2 ...
lset-xor = list1 ...
lset-diff+intersection = list1 list2 ...
lset-union! lset-intersection! lset-difference!
lset-xor! lset-diff+intersection!
adjoin {,q,v}: list elt1 ...
union {,q,v} {,!}: list1 ...
intersection {,q,v} {,!}: list1 list2 ...
list-difference {,q,v} {,!}: list1 list2 ...
list-xor {,q,v} {,!}: list1 ...
diff+intersection {,q,v} {,!}: list1 list2 ...
lset<= = list1 list2 ... -> boolean
Returns true iff every LISTi
is a subset of
LISTi
+1, using =
for the element equality procedure.
(lset<= eq? '(a) '(a b a) '(a b c c)) => #t
lset= = list1 list2 ... -> boolean
Returns true iff every LISTi
is set-equal to
LISTi
+1, using =
for the element
equality procedure. "Set-equal" simply means that
LISTi
is a subset of LISTi
+1, and
LISTi
+1 is a subset of LISTi
.
(lset= eq? '(b e a) '(a e b) '(e e b a)) => #t
lset-adjoin = list elt1 ... -> list
Adds the ELTi
elements not already in the list
parameter to the result list. The result shares a common tail with
the list parameter. The new elements are added to the front of
the list, but no guarantees are made about their order. The
=
parameter is an equality procedure used to
determine if an ELTi
is already a member of
LIST
.
The list parameter is always a suffix of the result - even if the list parameter contains repeated elements, these are not reduced.
(lset-adjoin eq? '(a b c d c e) 'a 'e 'i 'o 'u) => (u o i a b c d c e)
lset-union = list1 ... -> list
Returns the union of the lists, using =
for the element equality
procedure. LIST1
is a suffix or tail of the result. Elements that are
repeated multiple times in a single list parameter may or may not occur
multiple times in the result; however, an element that occurs once in
more than one list parameter will only appear once in the result.
Elements introduced by LIST2
or following parameters occur in no
particular order.
(lset-union eq? '(a b c d e) '(a e i o u)) => (u o i a b c d e)
lset-intersection = list1 list2 ... -> list
Returns the intersection of the lists, using =
for
the element equality procedure. The result may share a common tail
with any of the list parameters. Elements that are repeated
multiple times in every list parameter may or may not occur
multiple times in the result; however, an element that occurs only
once in every list parameter will only appear once in the
result. No constraint is placed on the ordering of the elements in
the result.
(lset-intersection eq? '(a b c d e) '(a e i o u)) => (a e)
lset-difference = list1 list2 ... -> list
Returns the difference of the lists, using =
for the element equality
procedure - all the elements of LIST1
that do not appear in any of the
other LISTi
parameters. The result may share a common tail with LIST1
.
Elements that are repeated multiple times in the LIST1
parameter will
occur multiple times in the result. No constraint is placed on the ordering
of the elements in the result.
(lset-difference eq? '(a b c d e) '(a e i o u)) => (b c d)
lset-xor = list1 ... -> list
Returns the XOR of the lists, using =
for the element equality
procedure. If there are exactly two lists, this is all the elements
that appear in exactly one of the two lists. The operation is associative,
and thus extends to the n-ary case - the elements that appear in an
odd number of the lists. The result may share a common tail with any of
the LISTi
parameters.
A list element that occurs multiple times in a single list
parameter may or may not appear multiple times in the result. For
example, if LSET-XOR
is applied to a single list
parameter, it is permitted to return exactly that list
for its result. No constraint is placed on the ordering of the
elements in the result.
(lset-xor eq? '(a b c d e) '(a e i o u)) => (d c b i o u)
lset-diff+intersection = list1 list2 ... -> [list list]
Returns two values - the difference and the union of the lists. Is equivalent to
(values (lset-difference = list1 list2 ...)
(lset-intersection = list1 list2 ...))
but can be implemented more efficiently.
lset-union! = list1 ... -> list
lset-intersection! = list1 list2 ... -> list
lset-difference! = list1 list2 ... -> list
lset-xor! = list1 ... -> list
lset-diff+intersection! = list1 list2 ... -> [list list]
These are linear-update variants. They are allowed, but not
required, to use the cons cells in their first list parameter to
construct their answer. LSET-UNION!
is permitted to
recycle cons cells from any of its lists arguments.
EQUAL?
, EQ?
, and EQV?
for
element comparison, respectively.
union list1 ... -> list union! list1 ... -> list
unionq list1 ... -> list unionq! list1 ... -> list
unionv list1 ... -> list unionv! list1 ... -> list
intersection list1 list2 ... -> list intersection! list1 list2 ... -> list
intersectionq list1 list2 ... -> list intersectionq! list1 list2 ... -> list
intersectionv list1 list2 ... -> list intersectionv! list1 list2 ... -> list
list-xor list1 ... -> list list-xor! list1 ... -> list
list-xorq list1 ... -> list list-xorq! list1 ... -> list
list-xorv list1 ... -> list list-xorv! list1 ... -> list
adjoin list elt1 ... -> list
adjoinq list elt1 ... -> list
adjoinv list elt1 ... -> list
list-difference list1 list2 ... -> list
list-differenceq list1 list2 ... -> list
list-differencev list1 list2 ... -> list
list-difference! list1 list2 ... -> list
list-differenceq! list1 list2 ... -> list
list-differencev! list1 list2 ... -> list
diff+intersection list1 list2 ... -> list
diff+intersectionq list1 list2 ... -> list
diff+intersectionv list1 list2 ... -> list
diff+intersection! list1 list2 ... -> list
diff+intersectionq! list1 list2 ... -> list
diff+intersectionv! list1 list2 ... -> list
The "linear" in "linear update" doesn't mean "linear time" or "linear space" or any sort of multiple-of-n kind of meaning. It's a fancy term that pointy-headed type theorists and pure functional programmers use to describe systems where you are only allowed to have exactly one reference to each variable. This provides a guarantee that the value bound to a variable is bound to no other variable. So when you use a variable in a variable reference, you "use it up." Knowing that no one else has a pointer to that value means the a system primitive is free to side-effect its arguments to produce what is, observationally, a pure-functional result.
In the context of this library, "linear update" means you, the programmer, know there are no other live references to the value passed to the procedure - after passing the value to one of these procedures, the value of the old pointer is indeterminate. Basically, you are licensing the Scheme implementation to alter the data structure if it feels like it - you have declared you don't care either way.
You get no help from Scheme in checking that the values you claim are
"linear" really are. So you better get it right. Or play it safe and
use the non-!
procedures - doesn't do any good to
compute quickly if you get the wrong answer.
Why go to all this trouble to define the notion of "linear update" and use it in a procedure spec, instead of the more common notion of a "destructive" operation? First, note that destructive list-processing procedures are almost always used in a linear-update fashion. This is in part required by the special case of operating upon the empty list, which can't be side-effected. This means that destructive operators are not pure side-effects - they have to return a result. Second, note that code written using linear-update operators can be trivially ported to a pure, functional subset of Scheme by simply providing pure implementations of the linear-update operators. Finally, requiring destructive side-effects ruins opportunities to parallelise these operations - and the places where one has taken the trouble to spell out destructive operations are usually exactly the code one would want a parallelising compiler to parallelise: the efficiency-critical kernels of the algorithm. Linear-update operations are easily parallelised. Going with a linear-update spec doesn't close off these valuable alternative implementation techniques. This list library is intended as a set of low-level, basic operators, so we don't want to exclude these possible implementations.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.