[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Traits, dictionaries

> Bradd W. Szonye wrote:
>> 1. Bidirectionality and folding

scgmille@xxxxxxxxxxxxxxxxxx wrote:
> Perhaps the real solution is to make it optional that an ordered
> collection support a reverse fold in addition to a
> bidirectional-collection attribute.

Yes, that would be the simplest way to do it. I've been thinking about
how to deal with this without too much impact -- I don't want to hold up
finalization too much. Your SRFI is very close to having excellent
support for generic programming, but the collection "traits" fall just a
little bit short. (Also, I have some concerns about dictionaries; see
the end of the message.)

In generic programming, it's important for data structures to support
two kinds of traits: feature tests and performance traits. The feature
tests are also important from the OO point of view, because they're
effectively a kind of classification (inheritance)! The SRFI already
reflects this partially, by treating ordered collections as a "class" in
some contexts. In other words, bag? and ordered-collection? are both
classes (from the OO point of view), and they're both feature traits
(from the GP point of view).

Methods like ordered-collection? and bag? are feature traits; they tells
you which operations are valid on a collection. Methods like
bidirectional-collection? describe performance traits; they tell you
whether a given operation is "reasonable" for regular use.

While I recommend the inclusion of some performance traits, they're not
absolutely necessary. If you think it's too much trouble to include
them, they could always be postponed to a "generic programming" SRFI
based on SRFI-44.

>> For most unidirectional collections, collection-reverse-fold could be
>> an alias for collection-fold. For some of them, like unidirectional
>> sequences (i.e., lists), it can provide an inefficient reverse fold.

> I don't like that as much.  Reverse fold should be undefined if its
> semantics are the same as ordinary fold.

On second thought, you're correct. I also made the mistake of conflating
two different traits. There's a difference between asking whether you
*can* reverse a collection and asking whether you *should*.

> For sequences it should probably be defined even if inefficiently.
> The programmer can use the bidirectional attribute to figure out if
> this is going to be cost prohibitive.

Right! What we actually need is a feature trait (reversible-collection?)
and a performance trait (bidirectional-collection?). The former tells
you whether you can reverse the collection; the latter tells you whether
it's inexpensive. Vectors and proper lists are both reversible
collections, but only the vector is bidirectional.

I think the SRFI is only missing two important feature traits:

    reversible-collection?      supports collection-reverse-fold
    finite-collection?          collection-fold will halt on its own

Justification: Consider the sequence "fibonacci," which generates its
values instead of storing them. The Nth index of the sequence is the
(N+1)th fibonacci number. This collection is ordered but not finite or
reversible. You could use (sequence-ref fibonacci 3) to get the 4th
fibonacci number. To get a subsequence, you can fold the collection:

    (define (fibonacci-subsequence start end)
      (define (f n v s e acc)
         (cond? ((< n s) (values #t s e acc))
                ((<= n e) (values #t s e (cons v acc)))
                (else (values #f s e acc)))
      (let-values ((s e acc)
                     (collection-fold-keys fibonacci f start end '()))
        (reverse acc))))

This gives us generated sequences like Clean and ML! (A later SRFI could
extend your collection classes to standardize the interface for this
kind of sequence.)

Of course, you can't reverse fibonacci, even though it's ordered, so
reversible-collection? must be false. Likewise, folder-procs must
explicitly halt enumeration, so finite-collection? must be false.

This sort of thing is also handy for collections like circular lists,
wrap-around vectors, and similar collections.

I also think it would be a good idea to include a few performance
traits, like bidirectional-collection? and random-access-collection?,
but you could leave that to a future GP SRFI if you think it's
unecessary. However, it would be a good idea to at least explain the
concept of performance traits, especially since you already have one:
The mutable-collection? trait is essentially a performance trait. It
tells you to prefer the *-proc!! interfaces over the functional
interfaces. (By the way, I agree with Matthias Radestock: The ! and !!
procedures seem redundant. Do you know of any case where it's important
to actually distinguish between the two? It seems to me that you could
always trivially implement ! by returning the result of !!.) It's also a
feature trait, because it tells you about the interface, even if you
merge ! and !!: It tells you that you're allowed to ignore the returned
value of the mutable methods, because it's always eq? to the input

Summary of my opinions on collection traits:

1. The reversible-collection? and finite-collection? feature traits are
   NECESSARY to implement several desirable collections.

2. The bidirectional-collection? and random-access-collection?
   performance traits would be HELPFUL for generic programming.

3. I recommend merging the ! and !! procedures unless there's a strong
   argument not to.

>> 2. Equivalence of dictionaries

> This is a weakness in the wording I imagine.  Dictionaries compare
> keys and values using the same equivalence function, which takes two
> values at a time, not key/value pairs.  The equivalence function would
> be applied to two keys, one from each dictionary, then two the two
> values from each dictionary mapping.

That's a reasonable default, but it must not be a requirement, because
it makes some dictionaries impossible to implement!

For example, consider a dictionary that maps case-insensitive user IDs
to case-sensitive user names. There's no straightforward way to
implement this. You could do it by encapsulating the keys or values in
some other data structure, but why? That's just a hack to accommodate an
inflexible interface.

It would be better to have separate equivalence predicates for the keys
and the values (if necessary). Here's how I'd spec the constructor:

    (unordered-dictionary [key-eqv? [val-eqv?]] pair ...)
    (ordered-dictionary [less?] [val-eqv?] pair ...)

For unordered dictionaries, key-eqv? and val-eqv? both default to eqv?.
Ordered dictionaries are similar, except that they can derive key-eqv?
from less?, as long as it's a strict weak order (which includes < but
not partial orders like <=).

This complicates *-equivalence-function, since there's now more than one
of them. I can think of three ways to deal with that:

1. *-equivalence-function d => key-eqv? val-eqv?  -or-
   *-equivalence-function d => (cons key-eqv? val-eqv?)

Bad idea, because it breaks the Liskov substitution rule for
polymorphism. (A subtype's methods should be more generous about inputs
and stricter about outputs, which is false for this method. By the way,
this is a good reason *not* to make sequence a subtype of dictionary --
while all sequences are a kind of dictionary, the interface would break
the Liskov substitution rule.)

2. Introduce *-key-equivalence-function and *-value-equivalence-function,
   with *-equivalence-function => *-key-equivalence-function.

Question: (collection-fold dict) enumerates the values, right? That
seems like the right thing to do, because it makes for an interface
similar to sequences. However, there are also good reasons to enumerate
the keys or to enumerate key-value pairs (i.e., (cons key value)). I
keep arguing with myself over what's the best "default" enumeration for
dictionaries, and I can't make up my mind.

Anyway, assuming that it does enumerate values, there's some potential
here for confusion: The "default" enumerator operates on values, but the
"default" equivalence function operates on keys. I can think of two ways
to fix *that*:

2a. As above, but *-equivalence-function => *-value-equivalence-function.

Bad idea, because the equivalence function is supposed to "return true
if they should be considered equivalent for the purposes of contains?,
value insertion, removal, and key lookup."

2b. As above, (collection-fold dict) enumerates key-value pairs, and
    *-equivalence-function tests two pairs for equivalence.

That is, the "equivalence function" is a composition:
(lambda (a b) (and (key-eqv? (car a) (car b)) (val-eqv? (cdr a) (cdr b))))

This is better than 2a, but it still isn't really testing for key
equivalence. The basic problem here is that there are at least two major
uses of the equivalence function: dealing with keys within a single
collection, and comparing two entire collections.

I think the best solution is #2: Introduce explicit key-eqv? and
val-eqv? accessors, and have the "default" be the key comparator.
Generic algorithms will therefore need to be careful about combining
collection-fold with *-equivalence-function, but I think that's the
least of all evils in this case.

3. Get rid of *-equivalence-function for dictionaries.

If you don't like any of the alternatives, you could remove the accessor
entirely. I don't think it's necessary, but it is a viable option.

Summary of my opinions on dictionaries:

1. Dictionaries MUST have separate key-eqv? and val-eqv? predicates.

2. Ordered dictionaries SHOULD derive key-eqv? from the ordering
   predicate whenever possible. Specifically, they should encourage
   strict weak orders and discourage partial orders, which makes the
   less->eqv? conversion possible.

3. A dictionary's *-equivalence-function SHOULD return the key
   comparator. Dictionaries should have at least one additional accessor
   to obtain the value comparator.
Bradd W. Szonye