Title

List-Set Library

Author

Olin Shivers

Status

This SRFI is currently in withdrawn status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-3@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

This draft SRFI includes an "Issues" section, whose contents should be resolved before this proposal is accepted as a final SRFI.

Abstract

This SRFI proposes a coherent and comprehensive set of procedures for manipulating lists as sets; it is accompanied by a reference implementation of the spec. The reference implementation is

Be aware that these procedures are inherently O(n^2) in the lengths of their parameter lists - serious set operations on large lists should use alternate techniques.

Issues

It may well be the case that the 46 procedures in this library should be folded into the general list library I have submitted as a prior SRFI proposal. This should be resolved in the course of discussing these two SRFIs.

Should we provide both reliably-destructive and linear-update (potentially destructive) procedure variants? It really imposes no implementation overhead to do so - typically, one simply defines the linear-update name to be either the destructive or purely functional procedure. We would need a naming convention for linear-update functions - perhaps a terminal +, e.g.

UNION UNION! UNION+

Procedures are, generally speaking, allowed to disorder lists. This is in keeping with their interpretation as sets. (The initial list parameter of union and adjoin gets special dispensation.)

It was too messy to define variants of LSET<= and LSET= that are specialised for the usual EQ?, EQV? and EQUAL? element-equality functions, so I didn't.

Rationale

The procedures in this library have "pure" and "linear update" variants. A "pure" procedure has no side-effects, and in particular does not alter its arguments in any way. A "linear update" procedure is allowed - but not required - to side-effect its arguments in order to construct its result. "Linear update" procedures are typically given names ending with an exclamation point. So, for example, if the elements of LIST1 and LIST2 are disjoint, then (UNION! LIST1 LIST2) is allowed to construct its result by simply using SET-CDR! to set the cdr of the last pair of LIST1 to point to LIST2, and then returning LIST1. However, UNION! could also elect to perform a pure append operation - this is a legal definition of UNION!:

(define union! union)

This is why we do not call these procedures "destructive" - because they aren't required to be destructive. They are potentially destructive.

What this means is that you may only apply linear-update procedures to values that you know are "dead" - values that will never be used again in your program. This must be so, since you can't rely on the value passed to a linear-update procedure after that procedure has been called. It might be unchanged; it might be altered.

Background on linear update.

Specification

There are 46 procedures - 13 basic procedures, and another 33 which are variants specialised to use the primitive element-equality procedures EQ?, EQV? and EQUAL?.

Procedure List

Procedure Specification

lset<= = list1 list2 ... -> boolean

Returns true iff every LISTi is a subset of LISTi+1, using = for the element equality procedure.

(lset<= eq? '(a) '(a b a) '(a b c c)) => #t

lset= = list1 list2 ... -> boolean

Returns true iff every LISTi is set-equal to LISTi+1, using = for the element equality procedure. "Set-equal" simply means that LISTi is a subset of LISTi+1, and LISTi+1 is a subset of LISTi.

(lset= eq? '(b e a) '(a e b) '(e e b a)) => #t

lset-adjoin = list elt1 ... -> list

Adds the ELTi elements not already in the list parameter to the result list. The result shares a common tail with the list parameter. The new elements are added to the front of the list, but no guarantees are made about their order. The = parameter is an equality procedure used to determine if an ELTi is already a member of LIST.

The list parameter is always a suffix of the result - even if the list parameter contains repeated elements, these are not reduced.

(lset-adjoin eq? '(a b c d c e) 'a 'e 'i 'o 'u) => (u o i a b c d c e)

lset-union = list1 ... -> list

Returns the union of the lists, using = for the element equality procedure. LIST1 is a suffix or tail of the result. Elements that are repeated multiple times in a single list parameter may or may not occur multiple times in the result; however, an element that occurs once in more than one list parameter will only appear once in the result.

Elements introduced by LIST2 or following parameters occur in no particular order.

(lset-union eq? '(a b c d e) '(a e i o u)) => (u o i a b c d e)

lset-intersection = list1 list2 ... -> list

Returns the intersection of the lists, using = for the element equality procedure. The result may share a common tail with any of the list parameters. Elements that are repeated multiple times in every list parameter may or may not occur multiple times in the result; however, an element that occurs only once in every list parameter will only appear once in the result. No constraint is placed on the ordering of the elements in the result.

(lset-intersection eq? '(a b c d e) '(a e i o u)) => (a e)

lset-difference = list1 list2 ... -> list

Returns the difference of the lists, using = for the element equality procedure - all the elements of LIST1 that do not appear in any of the other LISTi parameters. The result may share a common tail with LIST1. Elements that are repeated multiple times in the LIST1 parameter will occur multiple times in the result. No constraint is placed on the ordering of the elements in the result.

(lset-difference eq? '(a b c d e) '(a e i o u)) => (b c d)

lset-xor = list1 ... -> list

Returns the XOR of the lists, using = for the element equality procedure. If there are exactly two lists, this is all the elements that appear in exactly one of the two lists. The operation is associative, and thus extends to the n-ary case - the elements that appear in an odd number of the lists. The result may share a common tail with any of the LISTi parameters.

A list element that occurs multiple times in a single list parameter may or may not appear multiple times in the result. For example, if LSET-XOR is applied to a single list parameter, it is permitted to return exactly that list for its result. No constraint is placed on the ordering of the elements in the result.

(lset-xor eq? '(a b c d e) '(a e i o u)) => (d c b i o u)

lset-diff+intersection = list1 list2 ... -> [list list]

Returns two values - the difference and the union of the lists. Is equivalent to


(values (lset-difference = list1 list2 ...)
        (lset-intersection = list1 list2 ...))

but can be implemented more efficiently.

lset-union! = list1 ... -> list
lset-intersection! = list1 list2 ... -> list
lset-difference! = list1 list2 ... -> list
lset-xor! = list1 ... -> list
lset-diff+intersection! = list1 list2 ... -> [list list]

These are linear-update variants. They are allowed, but not required, to use the cons cells in their first list parameter to construct their answer. LSET-UNION! is permitted to recycle cons cells from any of its lists arguments.

The following variants of the general lset procedures use EQUAL?, EQ?, and EQV? for element comparison, respectively.

union   list1 ... -> list		union!  list1 ... -> list
unionq  list1 ... -> list		unionq! list1 ... -> list
unionv  list1 ... -> list		unionv! list1 ... -> list

intersection   list1 list2 ... -> list	intersection!  list1 list2 ... -> list
intersectionq  list1 list2 ... -> list	intersectionq! list1 list2 ... -> list
intersectionv  list1 list2 ... -> list	intersectionv! list1 list2 ... -> list

list-xor   list1 ... -> list		list-xor!  list1 ... -> list
list-xorq  list1 ... -> list		list-xorq! list1 ... -> list
list-xorv  list1 ... -> list		list-xorv! list1 ... -> list

adjoin  list elt1 ... -> list
adjoinq list elt1 ... -> list
adjoinv list elt1 ... -> list

list-difference   list1 list2 ... -> list	
list-differenceq  list1 list2 ... -> list	
list-differencev  list1 list2 ... -> list	
list-difference!  list1 list2 ... -> list
list-differenceq! list1 list2 ... -> list
list-differencev! list1 list2 ... -> list

diff+intersection   list1 list2 ... -> list
diff+intersectionq  list1 list2 ... -> list
diff+intersectionv  list1 list2 ... -> list
diff+intersection!  list1 list2 ... -> list
diff+intersectionq! list1 list2 ... -> list
diff+intersectionv! list1 list2 ... -> list

Implementation

Source for the reference implementation.

Linear Update

The "linear" in "linear update" doesn't mean "linear time" or "linear space" or any sort of multiple-of-n kind of meaning. It's a fancy term that pointy-headed type theorists and pure functional programmers use to describe systems where you are only allowed to have exactly one reference to each variable. This provides a guarantee that the value bound to a variable is bound to no other variable. So when you use a variable in a variable reference, you "use it up." Knowing that no one else has a pointer to that value means the a system primitive is free to side-effect its arguments to produce what is, observationally, a pure-functional result.

In the context of this library, "linear update" means you, the programmer, know there are no other live references to the value passed to the procedure - after passing the value to one of these procedures, the value of the old pointer is indeterminate. Basically, you are licensing the Scheme implementation to alter the data structure if it feels like it - you have declared you don't care either way.

You get no help from Scheme in checking that the values you claim are "linear" really are. So you better get it right. Or play it safe and use the non-! procedures - doesn't do any good to compute quickly if you get the wrong answer.

Why go to all this trouble to define the notion of "linear update" and use it in a procedure spec, instead of the more common notion of a "destructive" operation? First, note that destructive list-processing procedures are almost always used in a linear-update fashion. This is in part required by the special case of operating upon the empty list, which can't be side-effected. This means that destructive operators are not pure side-effects - they have to return a result. Second, note that code written using linear-update operators can be trivially ported to a pure, functional subset of Scheme by simply providing pure implementations of the linear-update operators. Finally, requiring destructive side-effects ruins opportunities to parallelise these operations - and the places where one has taken the trouble to spell out destructive operations are usually exactly the code one would want a parallelising compiler to parallelise: the efficiency-critical kernels of the algorithm. Linear-update operations are easily parallelised. Going with a linear-update spec doesn't close off these valuable alternative implementation techniques. This list library is intended as a set of low-level, basic operators, so we don't want to exclude these possible implementations.

Copyright

Copyright (C) Olin Shivers (1998). All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Mike Sperber
Last modified: Sun Jan 28 13:40:31 MET 2007