[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the discussion so far

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

Matthew Flatt <mflatt@xxxxxxxxxxx> writes:

> A similar line of reasoning applies to the other operations. In
> contrast, a `string-ci=?' based on the the Unicode collation algorithm,
> while certainly a better approximation, seems like too much of an
> implementation burden to be in the SRFI.

Note that collation is for string sorting - i.e. STRING<? and
friends - while STRING-CI=? should use case folding.

String collation is very complex, as the "preferred" order of
characters depends on the locale. But since STRING<? and friends
are often used for things like binary search trees where the exact
order is irrelevant and the only important thing is the existance
of any kind of total order, defining them the way this SRFI does -
by using the codepoint sequence - is good, because it is fast. If
the implementation wants to provide the locale-dependent string
collation, fine, but that's not useful for this SRFI to define.

In contrast, case folding is available for Unicode as a simple
table which maps codepoints to the case-folded variant. There are
two tables: The simple case folding maps a single codepoint to a
single codepoint, while the full case folding table maps a single
codepoint to one or more codepoints.

Since Unicode support requires such lookup tables for about
anything - including downcasing -, using the case folding table is
not much of an extra burden.

        -- Jorgen

((email . "forcer@xxxxxxxxx") (www . "http://www.forcix.cx/";)
 (gpg   . "1024D/028AF63C")   (irc . "nick forcer on IRCnet"))