[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oleg@xxxxxxxxx: Minor quibbles on the latest draft]

This page is part of the web mail archives of SRFI 44 from before July 7th, 2015. The new archives for SRFI 44 contain all messages, not just those from before July 7th, 2015.



[forwarded on behalf of Oleg]

Hello!

	I'm sorry I'm late to the party, which has almost finished. It
looks like a great party.

	The following comments apply to the latest draft
	http://sgmiller.org/srfi-44.html

"As standard Scheme lacks the ability to automatically dispatch on
different function behaviors based on the types of arguments, it
difficult to devise a portable implementation of this SRFI which
allows arbitrary future collections. It is expected that Scheme system
implementors will take advantage of generic function or
object-oriented features of their systems to implement this SRFI
efficiently."

I wonder why a module system was not mentioned at all? Quite often
people use a particular standard collection as an implementation
vehicle for their advanced data structure. A collection will be
encapsulated into the data structure. For example, a graph data
structure may use a dictionary to store adjacency information for a node
(or to store a cost value for a pair of nodes). In that case, most
of the implementation code is monomorphic with respect to the
collection type and no run-time dispatch is required. Given an
advanced module system, I can import a particular dictionary
implementation (e.g., associative lists) with renaming
	make-alist => make-mydict
	alist-get  => mydict-get
etc. In my own code I would use the names make-mydict, mydict-get,
etc. exclusively. If at a later time I decide to use a different
dictionary implementation, I would only need to change the module
declaration statement.
	
I have also taken a look at the Edison:
	http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/hfl/hfl/edison/
in particular, Coll/Coll/Collection.hs

The latter gave a few suggestions.

> procedure: % value ... => % 

>      Constructs a % collection with the zero or more values provided
> as its initial contents. 

Should we add that if the collection is a set, it is unspecified which
element is kept in case of duplicates? Or should duplicates be
outlawed? [Edison takes the former approach].

> procedure: *-add bag value => *
> procedure: *-add![!] bag value => * 

>      Adds a single value to a bag. 

> procedure: *-add-all dest-bag source-bag => *
> procedure: *-add-all![!] dest-bag source-bag => * 

>      Adds all the values in the source bag to the destination bag. 

Edison says: insert [our add] keeps the new element in case of
duplicates, but insertSeq [our add-all] keeps an unspecified element.

Edison defines the procedure count, for the occurrence count of an
element in a collection. Sure, count may be expensive for some
collections. For sets, the occurrence count is defined too (but
required to return either 0 or 1). Edison also defines a procedure

>   instanceName   :: c -> String
>     -- the name of the module implementing c

which can be useful, for example, for generating error messages. The
procedure is useful regardless of a particular module or OO system.

For ordered collections, Edison defines procedures to view, insert or
delete the maximum and the minimum elements. I don't know how useful
this may be (many advanced collections do allow efficient
implementations of such operations -- especially for collections that
are used for priority queues or dequeues).

It goes without saying that many operations in Edison -- like union or
intersection of two collections -- are perhaps too high-order or too
laborous or too infrequent to warrant their specification in
SRFI-44. It's better to stick to the bare minimum and provide
extensions only for very frequent operations.


> procedure: *-ref sequence integer => boolean 

>      Returns the value stored in the sequence at the integer index
> provided. It is an error to reference an index outside the range of
> the sequence. 

The return type can't be boolean. Right?

BTW, often a function *-ref-with-default is surprisingly useful. The
function takes an extra argument -- the default value -- and returns
literally (in the sense of eq?) that argument in case the lookup
failed.

Edison's sequences,
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/hfl/hfl/edison/Seq/Sequence.hs

are also interesting, in particular procedures lview, rview, inBounds,
adjust and zippers. However, one may also leave them out for SRFIs to
come.



> (eq? <input collection> (update-procedure <input collection>)) will
> always return a non-false value.

Perhaps this statement is too conservative? R5RS requires that eq? (as
well as eqv? and equal?) return a boolean value. Therefore, the only
non-false return value of eq? is #t. It might be a good idea to say
that explicitly. 


> The collection fold procedure then invokes the folding function on a
> value of the collection and the seeds, and receives from the folding
> function a proceed value followed by a like number of seed return
> values. If the termination value is non-false,

Perhaps you meant "the proceed value" rather than a termination
value. Otherwise, "termination value" is not mentioned anywhere else in
the document.

> In order to ensure a consistent ordering in the collection, the
> ordering function must return the same result for like inputs over
> time. 
In other words, an ordering function must be a pure function, a
mapping of its arguments. One might add that an implementation of a
collection may assume 
    (and (equal? x x1) (equal? y y1)) ==>
	(equal? (ordering-function x y) (ordering-function x1 y1))

Should the same requirement be applied to a equivalence function as
well?

	Cheers,
	Oleg

Attachment: pgp6FtTpDUqy1.pgp
Description: PGP signature