This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.
Brad Lucier pointed out that my reference implementation of string-hash (and char-set-hash) doesn't match the comments above it. Here's an explanation of the issue. The basic idea for the string-hash function (this is not part of the spec; it's just the particular implementation chosen for the reference implementation) is taken from Java. We treat the string cn...c0 as a radix-37 numeral (albeit one where the individual digits can be larger then 36), and use that for the hash value: h = (c0 + c1*37 + c2*37^2 + ...) mod bound where BOUND is the desired range of the hash function. Now, the expression in parens is going to be a a really big number for a long string; we'd like to compute H without using a bignum package. We *could* do this: h := 0 for i = n to 0 do h := (c[i] + h*37) mod bound That's a lot of mod operations, however. Instead, why not define MASK to be a string of 1-bits long enough to "cover" BOUND, that is mask := 2^(ceiling(lg bound)) - 1 and then just keep these low bits in the intermediate computation, finishing up with a final mod op: h := 0 for i = n to 0 do h := (c[i] + h*37) and mask h := h mod bound Great -- that's much more efficient. But it ain't what we originally set out to compute -- it isn't computing the h value I originally wrote down. That is what Brad spotted (which I had missed). The question is: does it matter? Is this more-quickly-computed value any less good as a hash value? I don't think so. I have not thought it through carefully, however. Anybody else have anything else to say? -Olin