[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oleg@xxxxxxxxx: Interface view of dictionaries]



> Bradd W. Szonye wrote:
>> However, there's a significant class of collections -- any tree-based
>> dictionary, for example -- where you can improve from O(N)
>> performance to O(lg N + K) performance for sequential access to keys.
>> That's both efficient and generic.

scgmille@xxxxxxxxxxxxxxxxxx wrote:
> I agree, and I expect a tree implementation of dictionaries to specify
> exactly those operators.  But they don't necessarily extend to
> hashtables, disk file databases, or any number of other dictionary
> mappable datastructures... a set which is larger than the set of Tree
> dictionaries.

That's when you ask, "Are these interfaces actually inefficient for
other kinds of dictionaries? They're still convenient, so is there a
good reason not to provide them?"

Consider procedures that operate on K sequential elements, starting with
a given key. It's a common operation for dictionaries that permit
duplicate keys; you often need to enumerate all elements that match a
given key.

You can always use the FOLD primitive to implement these operations, but
it runs in O(N) time. (If you need to mutate a collection with an
unstable enumerator, that increases to O(N-squared) time.) If you
provide the interface as an atomic operation, however, a typical
tree dictionary can cut that to O(lg N + K), and a typical hash table
dictionary can cut it to O(K). Alists don't give you anything special,
but they do still match the enumerator's O(N), and it may be able to
the avoid O(N-squared) penalty for unstable mutations, if you implement
it atomically.

Do you know of any common dictionary types that would actually do worse
with the atomic operation? (Given that you can always implement it with
a full enumeration, I doubt it.)

>> And Bear is trying to establish that some of these functions are not
>> only common but important extensions. For most dictionary-type
>> collections, primitives aren't sufficient. There's a convenience
>> argument *and* a performance argument for extending the set.

> But only if those operators generalize to all dictionaries.

They'd better, or it'll be very difficult to deal with duplicate keys.
IIRC, I mentioned this in my last review of dictionaries; the interface
doesn't even have GET-ANY to fetch all matching elements. That's an
important interface for dictionaries with non-unique keys.

> Otherwise they impair the elegance and utility of the API that is
> general to all dictionaries.

Pardon my vulgarity, but screw elegance. It means different things to
everyone, and it's the doom of many a design. The overzealous desire for
so-called elegance is one of the things that leads to what Bear called
APIs for stupid programs.

I see this desire for elegance in designers of all kinds, and it almost
always reflects the author's unreasoning infatuation with some design,
whether it's actually useful or not. In my opinion, "elegance" should
never, ever be a design goal; it's a primrose path that leads to Hell.

Oleg wrote:
>>> The intent is not to make it unnecessary to add more API in future
>>> SRFIs. Rather, the intent is to make it unnecessary to remove
>>> SRFI-44 features in future SRFIs.

>> Why would you need to remove a function that is more convenient and more
>> efficient?

> Because they may not apply to a hypothetical collection.

I'm sorry, what sort of dictionary does "fetch K elements from key" not
apply to? This is a perfect example of being too attached to "elegance"
and too willing to throw away useful concepts.

>> Also, speaking of things which might need to be removed, I'm
>> concerned about the object-oriented nature of the containers .... For
>> example, the hierarchy looks like it's based on a traditional
>> "objects as references" class-based system, when an "objects as
>> values" prototype-based system may fit better into Scheme.
>> (Especially given the way such a system incorporates primitive data
>> structures according to content rather than type).

> Hardly.  The SRFI was defined before there was even an object system
> selected for the reference implementation.  We were very careful not
> to evey specify polymorphism.  The SRFI only requires that the more
> general functions work for the 'subtypes' of the SRFI.  It never says
> that the collections actually be subtypes in some object system, its
> just a useful metaphor.

But the metaphor reeks of class-based, objects-as-references
inheritance. I asked before, and I'll ask again: What's the OO design
principle behind the classes? Should the Liskov Substitution Principle
hold? Why isn't set a subtype of bag, or vice versa? Why aren't
dictionaries and sequences more closely related? Is the bag type
justified at all?

Right now, I don't know why the hierarchy looks the way it does. For all
I know, the design was, "These collection types seem kinda related.
Let's make one a subtype of the other." What are the actual principles,
and how will they hold up in a prototype-based system or with parametric
polymorphism? Does the hierarchy constrain implementors in ways that you
don't expect?

With no explicit design goals, no complete implementation, and no
examples of the collections in actual use, how are we supposed to trust
that these won't be problems?

> You seem to portray this as if I'm hiding some dark secret about
> limitations to the API.  I'm quite confident in the API in fact, and
> just don't believe that I need 'prove' its good to you by writing a
> bunch of code that won't wind up in the SRFI.

That's nice that you're confident in your own design. But isn't that the
whole point of the reference implementation? We're not *supposed* to
take your implementation on faith. *You're* supposed to demonstrate that
it's actually implementable. Extra points if you can show that it stands
up well under real use.

If it seems like I'm being a jerk about this, it's because my day job is
100% about quality -- code reviews, design reviews, testing,
verification, making sure that we have more than just the developer's
word that a system is good. This "I don't need to prove that my design
is good" attitude doesn't fly in a code review. And isn't that exactly
what the SRFI draft process is?

> I have limited time for this SRFI among many many projects, so this is
> strictly pragmatic.

Again, that's nice. It's not going to convince me to lower my standards
for code reviews, though.

>> So while I agree that there is value in a stable API, I don't think
>> there's any value in rushing an immature API to finalization just so
>> that we can call it "stable." It'd be much better to propose a
>> concrete collection or three, get some experience with their use, and
>> it they're successful, factor out the interface and publish that as a
>> SRFI.

> This is arguably bad design. Writing many collections and then culling
> the common operators is a great way to wind up with a poorly thought
> out system.

Not if those collections are themselves well-designed. Design a good
system, implement it, determine what can be reused or factored out for
future projects. That's how reuse works. If you haven't already, I
strongly recommend that you study it. I can dig out some good titles if
you like, although they're all for C++ rather than Scheme.

Write some real, useful collections to your spec. Use them in a project
or two -- doesn't need to be anything huge, just something bigger than a
toy. Change the interfaces that are too cumbersome, maybe add a few.
Review the design again. Factor out everything that worked well and
publish that. Even better, just publish the useful collections and let
the design speak for itself. That would be *much* more useful than an
untested design doc.

> Especially when the stated goal here is to create a general,
> *interoperable* framework for collections.

Once again: It isn't general or interoperable until you've seen it in
use, until you've tried to use it and reuse it. Until then, it's just an
interface that you *hope* is reusable. Many projects start out by
assuming that interfaces like these will be reusable. They typically
fail. Again, spend some time studying reusability if you haven't
already; getting it right is a LOT harder than it seems.

>> Actually, if they're successful, there's less need to publish the
>> interface separately. Future developers will imitate it like they
>> already imitate SRFI-1. That's a much more desirable outcome: We get
>> useful collection types *and* a de facto standard interface that way.

> Name one language with a useful generic collections library that got
> it this way?

You have heard of C++, right?
-- 
Bradd W. Szonye
http://www.szonye.com/bradd