[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Still on David's issues; SRFI 44



Panu Kalliokoski wrote:
Whether SRFI 69 should be about general mappings or hash tables, has
been IMO adequately addressed; if there are suggestions about how to
make it clearer in the abstract that it _is_ about hash tables, they are
welcome.

In light of your comments, I agree with you Panu. This is not just a mapping datastructure. If my comments seemed off-base it is only because I am confused as to what the precise purpose of this SRFI is. Without knowing the precise purpose, I cannot make concrete suggestions on the document. Your response clarifies some, and I appreciate your patience and work. My suggestions are below. I have no intentions of being insulting.

I have the following suggestions for the abstract and rationale. If you comment on these I can provide concrete suggestions for the rest of the document. I have tried to characterize the purpose of this SRFI as making it possible to write portable code that uses hash tables in the most common ways while remaining efficient. Do you agree with that characterization?

I think the simple and generic aim should be dropped. The design of this SRFI should be simple and generic only so far as it furthers the above aim.

---8<---

Abstract

;; outline the need for, and design of, the proposal.

Hash tables are mutable data structures meeting certain complexity requirements that provide a mapping from some set of keys to some set of values associated to those keys. When a good hash function is used, key lookup and destructive update must be performed in amortised constant time.

Widely recognised as a fundamental data structure, most Scheme systems provide their functionality. However, no Scheme standard exists for hash tables and the functionality and interfaces provided by implementations varies widely, making it difficult to write portable programs that use hash tables.

This SRFI specifies an API for hash tables designed so that portable programs can be written which make efficient use of common hash table functionality.


Rationale

;; explain why the proposal should be incorporated as a standard feature in
;; Scheme implementations. If there are other standards which this proposal
;; will replace or with which it will compete, the rationale should explain
;; why the present proposal is a substantial improvement.

Hash tables are widely recognised as a fundamental data structure for many kinds of computational tasks. Almost every non-minimal Scheme implementation provides some kind of hash table functionality, although there is no existing standard.

Alas, although somewhat similar, these hash table APIs have many differences: some trivial, like the naming of certain functions; some complex, like revealing different aspects of the internal implementation to the user; some coarse, like requiring keys to be of some specific type(s); some subtle, like requiring the user to guess the size of the hash table in advance to get optimal performance. As a result, it is difficult to write portable programs that use hash tables.

The primary aim of this SRFI is to establish a standard API for hash tables so that portable programs can be written which make efficient use of common hash table functionality. The API resolves the discrepancies between the various names and semantics for hash table operations provided by Scheme systems by standardizing the names and behaviors of the most common operations. Incorporating this SRFI as a standard feature of Scheme implementations makes it possible to write efficient and portable programs that use hash tables.

---8<---

Suggestions on the text given the aim of making it so that portable programs can be written which make efficient use of common hash table functionality:

There are several things in the document that *may* lead to more efficient use depending on the implementation. I believe all of these should be dropped, leaving specified only those things which *must* lead to efficient use in all conforming implementations. Specifically:

Drop size-hint. Size-hint may be ignored as currently specified, thus its status as an optional argument to hash table constructors does nothing to make efficient use of hash tables portable. Allow implementations to extend the parameters arbitrarily to make implementation specific improvements.

Drop type specific hash tables. If they are simply shorthands as stated, they do nothing to make efficient use of hash tables portable.

If you'd like to suggest how particular implementations can improve efficiency, you could add the following: Implementations may improve efficiency by specializing the hash table implementation when given the equivalence procedure =, string=?...

I don't see any reason to complicate the API, in what I would say is an ad-hoc manner, for non-portable gains in potential efficiency.

If you decide not to do this, you should add a rationale for including type specific hash tables, and then a rationale for including this particular set of types. What is meant by "shorthand" is underspecified. What are the equivalence procedures used? What happens when you add a string key to symbol table? There should be some way of determining if a hash-table value was constructed with one of these constructors, eg symbol-hash-table? or hash-table-type (as in Gauche).


I think you should state the following in the beginning of the specification section. This seems implied by the current text, but this is more explicit:

---8<---

An implementation that does not provide lookup and destructive update in amortised constant time (when a good hash function is used) does not conform to this SRFI.

An implementation that does not provide good hash function definitions for
hash, {string,string-ci,symbol}-hash, and hash-by-identity does not conform to this SRFI.

---8<---


From the document:

   If some key occurs multiple times in alist, it is unspecified which of the
   corresponding values will end up in hash-table.

Why not specify that they are added in the order they appear in the alist, such as MzScheme does. If this is not the case, portable programs can't rely on this function unless it is never the case that a key appears twice in the alist. Is there something gained from this being unspecified that outweighs the loss in portability? There is no external syntax for hash tables so it seems to read and write hash tables portably one needs a well specified alist->hash-table and hash-table->alist.


But because David seems to be very concerned about the neglection of
SRFI 44, I probably have to tell why I'm not heeding its call.

I really have very little preference as to whether this SRFI follows the conventions of SRFI 44, or if the proposed hash table datatype fits into the subtype hierarchy of SRFI 44 collections. I'm asking only for a rationale why these choices were made. Much of your email is an irrelevant critique of SRFI 44, but you do provide the basis of a rationale for your choices that should be incorporated into the document. I believe what I've quoted below is relevant and can be summarized and included.

What would SRFI 44 API of hash tables look like?

Hash tables would have two supertypes: map and collection.  Collection
operations would forget about the keys.  No indexed collection could be
a subtype of a hash table, and no keyed collection could be a subtype of
an indexed collection.

There would be an asymmetry between items in a set and keys in a map:
even though these two are very similar, the former are treated as
"values" while the latter are treated as "keys".  Consequently,
operations for these have different names.

Hash tables could have a value equivalence function which would be never
used, but the user could not supply a hash function.  This implies that
appropriate hash functions could be guaranteed only up to some specific
coarseness (as with SRFI 69 defaults), and beyond that, hash tables
could not guarantee their complexity constraints.  In fact, every time
the user supplied an unrecognised key equivalence function, the
implementation would have to use a constant hash function (like (lambda
(x) 0) for example) to guarantee correctness, destroying all the value
of hash tables.

Hash tables would lack an operation that folds with both keys and
values, and hash-table->alist.  Generally, _all_ access to the actual
associations of the hash table would be indirect.

hash-table-update! would not be there, forcing people to look up the key
twice when they update the value.

Hash tables should be able to implement -delete-from! for a bag datatype
that does not yet exist.

Despite all this, I've tried to craft SRFI 69 so that it does not
_prevent_ a SRFI 44 style API to hash tables.


David