[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Still on David's issues; SRFI 44

This page is part of the web mail archives of SRFI 69 from before July 7th, 2015. The new archives for SRFI 69 contain all messages, not just those from before July 7th, 2015.

To: srfi-69@xxxxxxxxxxxxxxxxx
Subject: Re: Still on David's issues; SRFI 44
From: David Van Horn <dvanhorn@xxxxxxxxxxxxxxx>
Date: Wed, 17 Aug 2005 10:00:24 -0400
Delivered-to: srfi-69@xxxxxxxxxxxxxxxxx
In-reply-to: <20050814191154.GA2256@humma>
References: <20050814191154.GA2256@humma>
User-agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.6b) Gecko/20031206 Thunderbird/0.4

Panu Kalliokoski wrote:

Whether SRFI 69 should be about general mappings or hash tables, has
been IMO adequately addressed; if there are suggestions about how to
make it clearer in the abstract that it _is_ about hash tables, they are
welcome.

In light of your comments, I agree with you Panu. This is not just a mappingdatastructure. If my comments seemed off-base it is only because I amconfused as to what the precise purpose of this SRFI is. Without knowing theprecise purpose, I cannot make concrete suggestions on the document. Yourresponse clarifies some, and I appreciate your patience and work. Mysuggestions are below. I have no intentions of being insulting.

I have the following suggestions for the abstract and rationale. If youcomment on these I can provide concrete suggestions for the rest of thedocument. I have tried to characterize the purpose of this SRFI as making itpossible to write portable code that uses hash tables in the most common wayswhile remaining efficient. Do you agree with that characterization?

I think the simple and generic aim should be dropped. The design of this SRFIshould be simple and generic only so far as it furthers the above aim.


---8<---

Abstract

;; outline the need for, and design of, the proposal.

Hash tables are mutable data structures meeting certain complexityrequirements that provide a mapping from some set of keys to some set ofvalues associated to those keys. When a good hash function is used, keylookup and destructive update must be performed in amortised constant time.

Widely recognised as a fundamental data structure, most Scheme systems providetheir functionality. However, no Scheme standard exists for hash tables andthe functionality and interfaces provided by implementations varies widely,making it difficult to write portable programs that use hash tables.

This SRFI specifies an API for hash tables designed so that portable programscan be written which make efficient use of common hash table functionality.



Rationale

;; explain why the proposal should be incorporated as a standard feature in
;; Scheme implementations. If there are other standards which this proposal
;; will replace or with which it will compete, the rationale should explain
;; why the present proposal is a substantial improvement.

Hash tables are widely recognised as a fundamental data structure for manykinds of computational tasks. Almost every non-minimal Scheme implementationprovides some kind of hash table functionality, although there is no existingstandard.

Alas, although somewhat similar, these hash table APIs have many differences:some trivial, like the naming of certain functions; some complex, likerevealing different aspects of the internal implementation to the user; somecoarse, like requiring keys to be of some specific type(s); some subtle, likerequiring the user to guess the size of the hash table in advance to getoptimal performance. As a result, it is difficult to write portable programsthat use hash tables.

The primary aim of this SRFI is to establish a standard API for hash tables sothat portable programs can be written which make efficient use of common hashtable functionality. The API resolves the discrepancies between the variousnames and semantics for hash table operations provided by Scheme systems bystandardizing the names and behaviors of the most common operations.Incorporating this SRFI as a standard feature of Scheme implementations makesit possible to write efficient and portable programs that use hash tables.


---8<---

Suggestions on the text given the aim of making it so that portable programscan be written which make efficient use of common hash table functionality:

There are several things in the document that *may* lead to more efficient usedepending on the implementation. I believe all of these should be dropped,leaving specified only those things which *must* lead to efficient use in allconforming implementations. Specifically:

Drop size-hint. Size-hint may be ignored as currently specified, thus itsstatus as an optional argument to hash table constructors does nothing to makeefficient use of hash tables portable. Allow implementations to extend theparameters arbitrarily to make implementation specific improvements.

Drop type specific hash tables. If they are simply shorthands as stated, theydo nothing to make efficient use of hash tables portable.

If you'd like to suggest how particular implementations can improveefficiency, you could add the following: Implementations may improveefficiency by specializing the hash table implementation when given theequivalence procedure =, string=?...

I don't see any reason to complicate the API, in what I would say is an ad-hocmanner, for non-portable gains in potential efficiency.

If you decide not to do this, you should add a rationale for including typespecific hash tables, and then a rationale for including this particular setof types. What is meant by "shorthand" is underspecified. What are theequivalence procedures used? What happens when you add a string key to symboltable? There should be some way of determining if a hash-table value wasconstructed with one of these constructors, eg symbol-hash-table? orhash-table-type (as in Gauche).

I think you should state the following in the beginning of the specificationsection. This seems implied by the current text, but this is more explicit:


---8<---

An implementation that does not provide lookup and destructive update inamortised constant time (when a good hash function is used) does not conformto this SRFI.


An implementation that does not provide good hash function definitions for

hash, {string,string-ci,symbol}-hash, and hash-by-identity does not conform tothis SRFI.


---8<---


From the document:

   If some key occurs multiple times in alist, it is unspecified which of the
   corresponding values will end up in hash-table.

Why not specify that they are added in the order they appear in the alist,such as MzScheme does. If this is not the case, portable programs can't relyon this function unless it is never the case that a key appears twice in thealist. Is there something gained from this being unspecified that outweighsthe loss in portability? There is no external syntax for hash tables so itseems to read and write hash tables portably one needs a well specifiedalist->hash-table and hash-table->alist.

But because David seems to be very concerned about the neglection of
SRFI 44, I probably have to tell why I'm not heeding its call.

I really have very little preference as to whether this SRFI follows theconventions of SRFI 44, or if the proposed hash table datatype fits into thesubtype hierarchy of SRFI 44 collections. I'm asking only for a rationale whythese choices were made. Much of your email is an irrelevant critique of SRFI44, but you do provide the basis of a rationale for your choices that shouldbe incorporated into the document. I believe what I've quoted below isrelevant and can be summarized and included.

What would SRFI 44 API of hash tables look like?

Hash tables would have two supertypes: map and collection.  Collection
operations would forget about the keys.  No indexed collection could be
a subtype of a hash table, and no keyed collection could be a subtype of
an indexed collection.

There would be an asymmetry between items in a set and keys in a map:
even though these two are very similar, the former are treated as
"values" while the latter are treated as "keys".  Consequently,
operations for these have different names.

Hash tables could have a value equivalence function which would be never
used, but the user could not supply a hash function.  This implies that
appropriate hash functions could be guaranteed only up to some specific
coarseness (as with SRFI 69 defaults), and beyond that, hash tables
could not guarantee their complexity constraints.  In fact, every time
the user supplied an unrecognised key equivalence function, the
implementation would have to use a constant hash function (like (lambda
(x) 0) for example) to guarantee correctness, destroying all the value
of hash tables.

Hash tables would lack an operation that folds with both keys and
values, and hash-table->alist.  Generally, _all_ access to the actual
associations of the hash table would be indirect.

hash-table-update! would not be there, forcing people to look up the key
twice when they update the value.

Hash tables should be able to implement -delete-from! for a bag datatype
that does not yet exist.

Despite all this, I've tried to craft SRFI 69 so that it does not

_prevent_ a SRFI 44 style API to hash tables.



David

References:
- Still on David's issues; SRFI 44
  - From: Panu Kalliokoski

Prev by Date: Mutating keys
Next by Date: Drop ref/default, update!/default
Previous by thread: Re: Still on David's issues; SRFI 44
Next by thread: Mutating keys
Index(es):
- Date
- Thread