Re: LAST CALL for SRFI-113, Sets and bags

This page is part of the web mail archives of SRFI 113 from before July 7th, 2015. The new archives for SRFI 113 contain all messages, not just those from before July 7th, 2015.

Yes, let's add a bag-sum and bag-sum!. I forgot that multiset union and summation were different operations.

This makes defining alist->bag! difficult; there are two ways of merging bags and it is unclear which should be used. I am onboard with just removing alist->bag!. The only downside I can see is that some clients get forced to use a pure procedure where a linear-update one may've sufficed, which adds a modest constant factor, which seems acceptable.

I've found myself coding up my own Cartesian product several times, so I support adding a procedure for that. A list (a b) seems like the right data structure for each of the resulting tuples. I suggest standardizing an n-ary product, not just binary, so each tuple has as many elements as the number of set multiplicands.

Another operation worth considering is power set. It's useful, but is O(2^n), so it would need a disclaimer about that.

Kevin Wortman

On Wed, Aug 20, 2014 at 6:28 AM, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote:

Kevin Wortman scripsit:

> I think the alist conversion procedures are important since they allow bags
> to be defined in program text and serialized with read and write.

I agree, and there is no question of flushing bag->alist or alist->bag.
It's only alist->bag! whose consistency and utility I'm beginning to
question.

> I guess the pertinent question is: does ((a . 1) (1 . 2)) mean (1) "1 copy
> of a, and then *an additional* 2 copies of a"; or (2) "currently the count
> of a's is 1; in the past the count for a was 2." The semantics of assoc
> make it seem that conventionally, (2) is how alists with duplicate entries
> are interpreted. Which I think is consistent with your first definition for
> how alist->bag and alist->bag! should work.

I agree that assoc semantics should dominate, and all entries after the
first should be ignored. The question then is: given an implicit bag
defined by an alist with duplicate entries ignored, what is the semantics
of merging this into an existing bag? Is it union or sum or replacement?
It's not obvious to me that either semantics is a clear winner.

So perhaps we should only standardize alist->bag, and leave bag merger to
the user. This also suggests that we need a procedure bag-sum(!), with
semantics {a, a, a} + {a, a} = {a, a, a, a, a}. The current definition
of bag-union gives {a, a, a} on the assumption that the elements are
indistinguishable, whereas bag-sum would treat them as distinguishable.
Checking Wikipedia s.v. Multiset, I find that sum is defined precisely
this way, as well as scalar product: {a, a, a} x 2 = {a, a, a, a, a, a}.

Lastly, I think we should have cartesian product of both sets and bags.
I don't know how I came to overlook it before, probably because none of
the other libraries I have looked at (Scheme, Lisp, Python, Haskell,
etc.) have given it. It is a fundamental set operation, however.
I'll assume that the tuples (a,b) in the product should be represented
in Scheme as (a b), unless someone has strong contrary feelings.

--
John Cowan http://www.ccil.org/~cowan cowan@xxxxxxxx

"The exception proves the rule." Dimbulbs think: "Your counterexample proves
my theory." Latin students think "'Probat' means 'tests': the exception puts
the rule to the proof." But legal historians know it means "Evidence for an
exception is evidence of the existence of a rule in cases not excepted from."