[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interface view of dictionaries

On Sat, 25 Oct 2003 scgmille@xxxxxxxxxxxxxxxxxx wrote:

>Lets examine your proposed API for dictionaries, and contrast it with

>> (*-fetch-deleting! dict key)
>(define (*-fetch-deleting! dict key)
>  (let ([rv (*-get dict key)])
>    (*-remove! dict key)
>    rv))

problem:  This definition is going to duplicate the effort of
looking up the key.  For most dictionaries, it's going to be
more efficient to provide a direct implementation in terms of
the internal structures you're trying to abstract.  The
fetch-deleting! forms allow such more efficient implementations
to exist.  Trying to duplicate them with externally-defined
functions, as above, will work, but it is clumsy and will, for
most dictionary structures, slow the code down.

>Additionally, fetch-deleting! does not map to many dictionaries, for
>example disk file database, *without* doing the above.

Right.  Versions of fetch-deleting! that suck can be applied to
dictionaries whose structure implies that fetch-deleting! *has*
to suck.  Let's not hobble other dictionaries because of that.

>>      ;; get just gets one or more elements without caring which.
>> (*-get-one dict)
>(*-get-left dict)

What does "left" even *MEAN* here?!  Why is it in this name?

>> (*-get-one-deleting! dict)
>As in [A], with the same caveats.

And the same results.  Save the implementations that suck for
dictionaries whose structure demands that they suck and allow
space for better implementations where they can exist.

>> (*-get-n dict num)
>> (*-get-n-deleting! dict num)

>Trivially defined with enumeration.

And with the same problem.  On a lot of dictionaries, if I go
in knowing I want the first N elements, I can get them at one
swell foop and not waste time traversing the structure again
and again for each one.  Failure to provide a name for this
function practically demands an implementation that sucks,
which will then be applied to all dictionaries, whether the
implementation of this function for that dictionary had to
suck or not.

>>      ;; the following two are for functions of the signature
>>      ;; (callback dict keyval dataval)
>> (*-foreach dict callback)
>> (*-foreach! dict callback)

>(collection-fold-left dict fold-function)

>;; a copy is made of the dict to protect it from side effects
>Are you generating a new collection?  Then *-map is the name you want.
>for-each implies iteration over values (which is what the enumerators
>are for).

And when joe code writer is looking at the API going, I want
something that applies a function to each mapping, "foreach" is
going to attract his attention.  "Map" may or may not, even though
it's been used for lists.  "Foreach" is simply more obvious in its
intent than "Map."  Also, "Map", if taken to mean the same thing
it means with lists, should apply to MULTIPLE dictionaries, and
create a new dictionary with the results of calling the callback
function on the sets of mappings. That's not what someone is going
for with "foreach."

The existence of mutating and non-mutating versions (with and
without the exclamation point) is going to assure the coder that
the one can be used, even with destructive callback functions,
without worrying about destructive changes, but probably has a
higher overhead, and that the other allows the callback functions
to mutate the existing structure, or can be used with nondestructive
functions without incurring more overhead.

These are examples of functions that do exactly what their name
says they do, which will be recognized instantly by coders.  "Map"
implies things that these functions don't do, which are done by
"Map" on lists.

>> (get-iterator dict)   ;; returns an iterator function that
>>                       ;; takes a key arg and returns some "next" key,
>>                       ;; or takes '() and returns some (arbitrarily
>>                       ;; chosen for unordered dictionaries) "first" key.
>> (*-nextkey dict) ;; same as function returned from get-iterator
>> (*-nextkey dict key) ;; same as returned from the get-iterator func above
>> (*-lastkey dict)     ;; same as returned from the get-reversed-iterator func
>> (get-reversed-iterator dict)
>See Olegs post on the hazards of iterators.

I've seen it.  The "Hazards" arise when attempting to misuse them, eg,
using them for things that aren't finite, using them across mutations
of things, etc.  If you provide fold-left and fold-right, the first thing
joe coder is going to do with them is implement iterators, because iterators
are simpler to use.

><snip>Efficiency procs
>Best left to a separate SRFI.

Providing identical APIs for structures of profoundly different performance
characteristics is a minefield in terms of efficiency.  Not giving people a
way to check removes the last possible warning sign before the minefield is
entered.  Performance is important, especially if providing an API that makes
it so easy to write code whose performance sucks rocks.

>> (*-fetchfirst dict number) ;; returns number entries
>[B] Trivially implemented with collection-fold-left

But in most cases less efficiently than a native implementation could do it.

>> (*-fetchfirst-deleting! dict)
>> (*-fetchfirst-deleting! dict number)
>See [A]

Uh huh.  I'm sensing a pattern here.

>> (*-fetchlast dict)
>(*-get-right dict)
>> (*-fetchlast dict number)
>See [B]

Same argument, same problem.

>> (*-fetchlast-deleting! dict)
>> (*-fetchlast-deleting! dict number)
>See [A]

Same argument, same problem.

>> (*-fetchrange dict key1 key2)
>Possibly novel, but also implementable with enumeration

The only way to implement it using the procedures in the SRFI is
inferior, for most dictionaries, to the implementation that can
be provided internally.  if I'm getting a range, I don't want to
waste CPU time traversing the structure for each single damn

>> (*-fetch-next-n dict key1 number)
>> (*-fetch-next-n-deleting! dict key1 number)
>> (*-fetch-prev-n dict key1 number)
>> (*-fetch-prev-n-deleting! dict key1 number)
>> (*-fetchrange-deleting! dict key1 key2)
>Above + [A]

I'm offering you a way to avoid retraversing the structure for every
element.  You're saying you don't need it because you don't mind
writing code that sucks and retraverses the structure for every
element.  Trust me when I say that most people mind.  Most people want
to write code that's efficient.

If I want to take the last ten elements of a million-element list, I
don't want an implementation that gets them one at a time, by
traversing the entire damn list from the head for each one.  I
especially don't want an implementation that traverses the list ten
times to get them and ten more times to delete them.

I want a tail recursive algorithm that traverses that list exactly
once and comes back with the goods.  This is what all the fetch-n,
get-n, fetchrange, fetch-deleting!, etc, are about.  These functions
need to be provided by the implementor of the dictionary because the
implementor can, for most dictionaries, provide efficient versions.
With the API you're proposing, the efficient versions cannot be

> So, you see, your API is almost *exactly* like the current SRFI
> apart from minor naming variations.  Where it differs, there has
> been strong refutation for that approach.  I must ask, did you
> *read* the SRFI carefully?

My API allows efficient implementations.  The one proposed in the SRFI
does not.  Because of that, they are not even remotely similar.  Yes,
you could use the SRFI API to provide whacking slow versions of most
of the functionality.  But every coder who has to live with them is
going to hate them, either because they will waste CPU time that does
not have to be wasted, or because redefining R5RS functions won't play
nice with the existing code and module system, or both.