[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

case mappings

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



I agree with Bear that case-mappings are poorly defined on single
codepoints.

Michael Sperber wrote:
> I don't quite understand what you're saying: the locale-independent
> case mappings in UnicodeData.txt always map a single scalar value to a
> single scalar value.  Sure it doesn't always do what your locale
> thinks (as you point out), but this case mapping doesn't require
> "multi-codepoint characters."

This isn't just a "locale-awareness" problem.  True, the mappings in
UnicodeData.txt are for simplicity only the 1-1 mappings, but
SpecialCasing.txt includes a large number of mappings that aren't 1-1
regardless of locale.  The Unicode concept of locale-independent
case-mapping includes these special cases.  Without handling these
cases, R6RS would be using an incomplete case mapping rule,
which is therefore not usable in the general sense.  I don't think anyone
wants 90% compatibility thrown into the core language.

Because the proper definition is so complicated and slow, yet there
are many uses of strict ASCII case mapping in computer languages
and protocols, I think it makes sense to define the core case-mapping
procedures as ASCII-specific.  Full linguistic case-handling should be
provided by specialized library procedures which optionally accept locale,
and only work at the string level, since single-char case-mappings are
ill-defined.

char-title-case? would then no longer be needed.

-- 
Alex