[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oleg@xxxxxxxxx: Interface view of dictionaries]



Ray Dillinger wrote:
> And when joe code writer is looking at the API going, I want
> something that applies a function to each mapping, "foreach" is
> going to attract his attention.  

Allow me to paraphrase your argument: "When Joe the coder looks at
Scheme for a non-local-exit, he would probably look for something like
GOTO, BREAK, or RETURN. But he finds none of that. He might come
across call-with-current-continuation (because it's such a long name),
but it probably will not attract his attention as GOTO would have." Is
it a reasonable paraphrase? Should R5RS authors have added BREAK,
RETURN and GOTO to Scheme, as CL has done? It seems the authors have
chosen a different approach -- they added the most general control
function, call-with-current-continuation. The latter can be used to
write break, return, goto, throw. We can also employ
call-with-current-continuation to implement YIELD, for example, which
became quite fashionable lately. Python and now C# have to add the
corresponding keyword to the language. OTH, Scheme already has it.

The analogy between fold and call/cc is actually deep. The existence
of call/cc does not mean that GOTO or RETURN are banned. Just the
contrary. If a programmer needs them, he can easily get them, in a
generic and easy way. The same with fold. I have never argued that
cursors should be outlawed. I do say that cursors are useful and even
indispensable (although not as often as one might think). 

> If you provide fold-left and fold-right, the first thing joe coder is
> going to do with them is implement iterators, because iterators are
> simpler to use.
If that Joe finds cursors are the right tool for him, he can get them.
Let me quote the conversion function:

(define (lfold->lazy-list lfold collection)
  (delay
    (call-with-current-continuation
      (lambda (k-main)
        (lfold collection
          (lambda (val seed)
            ;(cerr (list val seed) nl)
            (values
              (call-with-current-continuation
                (lambda (k-reenter)
                  (k-main
                    (cons val
                      (delay
                        (call-with-current-continuation
                          (lambda (k-new-main)
                            (set! k-main k-new-main)
                            (k-reenter #t))))))))
              seed))
          '())                          ; Initial seed
        (k-main '())))))


The second half of
	http://pobox.com/~oleg/ftp/Scheme/enumerators-callcc-code.scm
shows the enumerator for files and the test cases.

As you can see, the conversion code works with EVERY SRFI-44 fold, for
every possible collection -- present and future. So Joe never needs to
implement cursors. He merely needs to invoke the conversion function
and pass it the enumerator function of the desired collection. If
call/cc is implemented efficiently (as it is on Chez, Gambit, and now
Scheme48; On Chicken, call/cc is free), the cursor obtained from the
enumerator _is_ efficient. See below on folding over multiple
collections.

> I've seen it.  The "Hazards" arise when attempting to misuse them, eg,
> using them for things that aren't finite, using them across mutations
> of things, etc. 

I don't remember arguing about hazards of cursors on infinite
collections. I claimed that cursors are difficult to write. If a
collection is based on an advanced (tree) structure, writing a cursor
is quite a challenge. No wonder the Python community is so excited about
'yield' (which is the inverted enumerator). The other notable problem
with cursors is their indeterminate extent. It's not clear when you're
done.

> If you provide fold-left and fold-right, the first thing joe coder is
> going to do with them is implement iterators, because iterators are
> simpler to use.

The opinion expressed in the latter phrase reminds me of assertions
that "imperative code is more natural" or "the infix notation is
simpler to use". USENET brings a host of such assertions every
month. Guido van Rossum has publicly stated recently that "loops may
be theoretically inferior to recursion, but I have no doubt that the
human brain has special reasoning abilities for loops, and many
real-world problems are most naturally expressed using loops rather
than recursion."

The other day I was re-reading my notes on Jeffrey Ullman's book
"Principles of Database Systems" and came across an amusing comparison
between a cursor-based network database query language DBTG, and
SQL. Jeffrey Ullman says that experience has shown that cursor-based
code is error-prone. In contrast, SQL's SELECT encapsulates the
traversal without leaking the state and thus makes application code
safer without compromising efficiency.

If you read the Haskell mailing list you probably noticed a great
amount of angst about IO. One person wrote that his favorite method of
accessing a file is an _enumerator_ (of a file considered as a
collection of characters, lines, or words). Many of the problems that
plague IO in Haskell just disappear.

I exhort you, could we just stop using the word 'iterator'? It means
completely different things in C++ and Ocaml, for example.

>> (*-fetchfirst dict number) ;; returns number entries
> [B] Trivially implemented with collection-fold-left

I don't think the argument about efficiency of implementing
*-fetchfirst via a collection-fold-left is productive. If a particular
collection permits an efficient implementation of *-fetchfirst, and if
a programmer writes an application that truly needs an efficient
*-fetchfirst, then the programmer can go ahead an write that efficient
*-fetchfirst for his particular collection. As I understand, SRFI-44
does not prohibit additional, collection-specific operations. True,
these operations make code less generic. OTH, there is always a
trade-off between genericity and specialization/performance, which has
to be resolved by a programmer using his judgment and considering all
circumstances. 

If I read the intent of SRFI-44 correctly, its goal is not to provide
all things for all people. Rather, the goal is to define the framework
and to describe a _minimal_ set of core functions plus a _limited_ set
of very common extensions. The intent is not to make it unnecessary to
add more API in future SRFIs. Rather, the intent is to make it
unnecessary to remove SRFI-44 features in future SRFIs.

In this vein, I argue we do need to pay attention to consistency, both
within the SRFI and in regards to other extensions. I believe concerns
raised by Shiro Kawai ought to be addressed (at least by discussing
them in the text of SRFI and noting the rationale for the deviations
were they exist).


People who want rich APIs have stated their preference. Probably I
should be allowed to state mine: I do not like rich APIs, both as a
user and an implementor. I have not yet encountered any need for a
*-fetchfirst function, for example. It is unlikely therefore I would
implement that function _natively_ in any of my collections. What if
someone really needs a very efficient realization of *-fetchfirst?
Well, he would find a collection that does that. That would not be
mine collection -- and that's fine. I don't want my code to be the
last word. I don't write code to fit everybody's purpose. I don't have
problem implementing extras (which I don't use) if it gives a notable
benefit of generality. I would not like implementing a lot of features
just because there is one person who needed that particular feature
once or twice. In general I don't like writing code that I don't
extensively use and thus have little experience using.


About experience using SRFI-44. I can see how to change my
implementation of treaps to fit SRFI-44. I have implemented TIFF
dictionaries to be somewhat like SRFI-44 collections. I didn't
implement everything that SRFI-44 requires -- and a good thing I
didn't as the standard is still evolving. I wrote %-fold-left for
files, TIFF dictionaries. I have been using folds for database
connections, in a _production_ code. Previously, I was using for-each
and realized that wasn't the best thing.


About fold-right. I'd rather wish to see fold-right gone. In a strict
language, it's not too useful, in my experience. Furthermore, if a
programmer wishes to apply fold-left to a reverse collection, perhaps
he should do just that (see below).

Reversible collections is another thing that I'm uneasy about. A
concept of views has received quite a lot of attention recently. In
that vein, we might wish for a function *-reverse that creates a
reverse view of a collection, if it's possible to do so efficiently.
	*-reverse coll -> coll

I can see how *-reverse can be *efficiently* implemented for a treap
(without copying the whole treap, of course). If we take the advantage
of an OO system, reverse can be a wrapper that creates a different
(sub)type, which will cause dispatch to (efficient) fold, getfirst,
etc. functions without exposing those functions to a programmer. So,
if we wish to enumerate the collection in reverse, we would write
	(collection-fold-left (collection-reverse coll) fn seed ...)

I'm uneasy to submit this approach for consideration into SRFI-44.  I
have not used it. I don't know how well it works. I'd prefer the
reverse view approach, if it is feasible at all, explained in a
separate SRFI, after we have implemented and used it.


Bradd W. Szonye wrote:
> Why the heck did SRFI-1 define the folding function to accept the seed
> argument *last*? There's only one seed but many lists, which naturally
> suggests a (f seed . data) signature for folding functions that can
> work on any number of lists. Instead, it specifies (f datum0 datum1
> ... seed). Why?!

My database interface actually works this way: one seed and multiple
data items. Most of the time I actually need several seeds. I noticed
that I constantly do packing and unpacking of these seeds into one
argument. I found therefore that multiple seeds work better in
practice. Furthermore, unlike folds on lists, folds over several
collections aren't that useful, see below. It's far more convenient to
arrange the variable number of seeds to be the last arguments.

Bradd W. Szonye wrote:
> And, most importantly, can you show how to implement a
> multiple-collection fold without cursors?
>
>     (nfold f seed c1 c2 c3 ...)

First of all, who said that we should ban cursors completely? When
traversing multiple collections, cursors are actually useful and
should be used, in my opinion. In contrast, folds over multiple
collections don't seem to be useful. For one thing, a fold over two
collections traverses the collections in a "lock-step". However, we
often need to enumerate each collection at its own pace (especially
when computing a join). Furthermore, the great benefit of an
enumerator is that it is privy to collection's internals and therefore
can do traversal efficiently. It's hard to make an enumerator that is
privy to details of two different collections. When dealing with
several collections, it's quite often in my experience that these
collections are of different types.

Of course, if one wishes for nfold, one can easily implement it: use
enumerators for c1, c2, c3, invert them into cursors, and use the
latter to write nfold. nfold can be written; it doesn't have to be
provided _natively_, as a primitive.


Given the calls for vote, I vote for finalizing. I see the benefit of
a stable API, which I can look up to when writing new collections or
updating the old ones.


Are you going to LL3? I'll have a poster presentation at LL3 -- about
enumerators, cursors. I also touch upon generators and iteration
inversion in languages without first-class continuations (such as
Haskell).



----- End forwarded message -----

-- 

Attachment: pgpIFQFYjobKk.pgp
Description: PGP signature