[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

various remarks

This page is part of the web mail archives of SRFI 69 from before July 7th, 2015. The new archives for SRFI 69 contain all messages, not just those from before July 7th, 2015.

To: srfi-69@xxxxxxxxxxxxxxxxx
Subject: various remarks
From: Sebastian Egner <sebastian.egner@xxxxxxxxxxx>
Date: Tue, 26 Apr 2005 11:27:58 +0200
Delivered-to: srfi-69@xxxxxxxxxxxxxxxxx
In-reply-to: <8P_ZgC.A.6iD.KrWbCB@rotkohl>

Nice work! If accepted this might increase portability a lot and
I am impatient to write (require "srfi-69") for the first time.

A few remarks on the gory details:

* You specify that hash-table-keys and hash-table-values do not
necessarily produce the elements in the same order. That sounds
highly dangerous to me, and it is not clear to me what the gain is.

In fact it would be reason for me to forget about -keys and -values
and fall back on -fold (like you do in the reference implementation)
to avoid introducing an error that will be potentially very hard to find.
Are there any relevant hash table implementations that are a lot faster
and enumerate keys and values in differing order?

* In view of confusion with SRFI-67 (compare procedures), would
you consider renaming "comparison" into something that refers more
to an equivalence relation than to a general comparison? One could
simply call the parameter <i>equal</i> with default <code>equal?</code>.

* I also second Neil's proposal to abandon the aliases.

Of course this is a matter of personal preference, but I can relate
my own development with aliases: In Ruby, Matz introduced lots of
aliases in the built-in classes. At first that seemed quite useful, but the more
I used it the less I liked it. In the end, I always pick one (e.g. either get or ref)
and forget the other---when somebody else uses the other, I need
to consult the documentation to make sure it's really the same thing and
its not clear what is won by the aliases. (Btw, I would go for ref, set!, and delete!)

* In the SRFIs I wrote in the past, I tried hard to stay clear of identifiers that
are used in the major Scheme implementations. This supports implementors
in integrating the new proposal into their system without conflict, even
if the functionality could supersede what they have already. This is
important because mature systems should not change abruptly.
Of course, for your SRFI this is really hard because most names
might be in use already---but anyway.

* You require "This operation should have an (amortised) complexity
of O(1) with respect to the number of associations in hash-table."
You might want to emphasize that this is conditional to a good hash
function being used.

* Something more of an experimental feature: My problem with hashing
in practice is that it is often surprisingly difficult to come up with a good
hash function for a new data structure. Would it be possible to provide
some help with this?

One way to do this is a set of procedures and macros that construct new
hash functions from old ones: Whenever several hash values are to be
combined, they are 'hashed together' in a deterministic but distributing way.

* You might want to rethink your design for the interaction between /hash/
and /bound/. The problem is this:

A standard way to combine hash functions h_1, .., h_n is to evaluate a
modular polynomial, i.e. h(x) = Sum[h_i(x) a^i : i in {1..n}] mod m, where
m is /bound/, and a is a number in {0..m-1} such that 1, a, a^2, etc. runs
through all elements of {1..m-1}.

Now in your design the hash function must work for all values of /bound/,
because /bound/ is chosen by the hash table and passed to the hash
function. This is a problem, because the modular polynomial method
only works for *some* values of m, namely those for which a suitable a
exists. (The first failure is m = 8.) If the modular polynomial method gets
used anyway, this is what happens: The numbers m for which it works
are low in density (e.g. it works 14888 times for m in {1..10^5}), and if a
bad m is picked then the hash distribution is certainly terrible because
the number of unused hash values is at least 50%. (The mathematical
reason is that {1, a, a^2, ..} is a a proper subgroup of (Z/m*Z)^x and by
that it is at most (m-1)/2 in size.)

As you see, the fact that /bound/ is an argument to the hash function
makes it difficult to combine hash functions, because hash functions
must work for all values of /bound/---which is considerably more difficult
than working for some value of /bound/.

For this reason, libraries for hash tables often use a different approach:
The hash function always returns a result in {0..hash-range - 1}, where
hash-range is a predefined constant, usually a prime close to but not
exceeding the maximal immediate integer. It is then up to hash-ref etc.
to map a hash value h in {0..hash-range - 1} into the index range of the
table at hand---which can then be done by (h mod n), n = table-size.
If the implementation only allocates tables of size a power of two, this
modular reduction can even be done by masking out the higher bits
of h.

* Currently, there is no way to portably make a copy of a hash table:
You cannot query /equal?/ and /hash/, and there also no hash-table-copy
either (which by the way, might be good to add.)

* hash-table-count sounds like counting, but you specify it is O(1).
How about hash-table-size then?

----
Dr. Sebastian Egner
Senior Scientist Channel Coding & Modulation
Philips Research Laboratories
Prof. Holstlaan 4 (WDC 1-051, 1st floor, room 51)
5656 AA Eindhoven
The Netherlands
tel: +31 40 27-43166 *** SINCE 10-Feb-2005 ***
fax: +31 40 27-44004
email: sebastian.egner@xxxxxxxxxxx

srfi-69-request@xxxxxxxxxxxxxxxxx

26-04-2005 00:10

To: srfi-69@xxxxxxxxxxxxxxxxx
cc: (bcc: Sebastian Egner/EHV/RESEARCH/PHILIPS)
Subject: provided for compatibility

Classification:

Regarding the "provided for compatibility" aliases... hash-table-ref === hash-table-get hash-table-set! === hash-table-put! hash-table-delete! === hash-table-remove! I think aliases like this are more appropriate in a library than in a SRFI. Currently, the SRFI draft would require all complying Scheme implementations to support multiple names for the same operation, which seems a little odd ("call/cc" notwithstanding). I'd lean towards having the SRFI specify only one name per operation, and having the Scheme implementations and user code comply with that. -- http://www.neilvandyke.org/

Prev by Date: Re: SRFI-44 compatibility
Next by Date: Re: SRFI-44 compatibility
Previous by thread: introspection
Next by thread: Naming
Index(es):
- Date
- Thread