[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

char/char-set/pred

This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.



   From: David Rush <drush@xxxxxxxxxxxx>
   - char-set params; wrap 'em up in your own lambda, please. Is
     it really more efficient to do this in the library, when
     R5RS doesn't support the data type anyway?

    From: "Sergei Egorov" <esl@xxxxxxxxxxxxxxx>
    ;;; CHAR/CHAR-SET/PRED parameters:
    In my opinion, this is an example of ad-hoc genericity:
    the choice of variants is more or less arbitrary (why
    STRING or CHAR-LIST are missing? How can I specify 
    -ci search for a char?) The whole idea does not fail 
    only because strings cannot contain char-sets or 
    procedures (this trick doesn't work with lists or
    vectors). I agree that something should be done 
    to stop the namespace pollution, but there are other
    ways: regular higher-order procedures. Besides,
    the CHAR/CHAR-SET/PRED approach is another slippery slope: 
    why don't we just define generic sequence procedures?

Not slippery at all. CHAR is the one squirrely case; I threw it in
because it is an important common case. You really have two general
ways to specify an arbitrary set of characters: as a predicate, and
as a character set. So these should be supported.

Let me also make a general point about char-sets. It's an important and
valuable property that membership testing with them is known to be fast & free
of side-effects. Consider, for example, STRING-FILTER. It needs to allocate a
target string in which to store the results. How long should this string be?
No problem: make one pass over the source string and just count the hits, then
allocate the string, then make another pass and install the hits.

Now let's consider using general Scheme lambdas -- predicates. You *cannot*
write the code this way. You must instead allocate a list or save the
results on the stack, because you may not apply the testing predicate
multiple times. It might have side effects. Or it might take five minutes
to run on every application, so doubling the number of calls is a terrible
idea. You just don't know.

This means that predicate-based filtering can allocate a lot of garbage just
to do its work. Imagine filtering a megabyte string -- you could end up
allocating 8Mb or 12Mb of list storage. Bogus!

Now, while the efficiency advantages of char-sets are clear, we nonetheless
want to support predicates in the general case, because this is Scheme, and
predicates as selectors is an important paradigm to support.

So there you have it. Chars, char-sets, and predicates -- it was very
carefully limited to exactly those three.
    -Olin