[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: case mappings
Alex Shinn <alexshinn@xxxxxxxxx> writes:
> Because the proper definition is so complicated and slow, yet there
> are many uses of strict ASCII case mapping in computer languages
> and protocols, I think it makes sense to define the core case-mapping
> procedures as ASCII-specific. Full linguistic case-handling should be
> provided by specialized library procedures which optionally accept locale,
> and only work at the string level, since single-char case-mappings are
I agree with almost everything you say, and then you say this part,
with which I earnestly disagree. As you say, we do not want 90%
support in the core language, but defining ASCII-specific procedures
like this is exactly a 90% solution. Actually, more like 70%, if
We should provide case-mapping procedures with optional locale
arguments. The omission of a locale argument should be the same as
giving "current-locale" or "(current-locale)" or some other global
reference of the sort.
We should require a locale which represents the case mappings in
UnicodeData.txt and SpecialCasing.txt. If we want another locale
which represents "ASCII-specific" case mapping, fine. But don't make
If you have two ways to do things, one standard and always-supported,
and one which is optional and loose, then everyone will use the
standard as a matter of course. So don't make the standard the Wrong
Thing. ASCII-specific case mappings are the Wrong Thing.
It would be much better to have no case mapping procedures in the
standard at all than to have half-assed ones.
I would therefore strongly urge the following approach:
All case-related procedures take an optional locale argument.
If the locale argument is omitted, it defaults to some global variable
(or the return value of a global procedure).
Specify as standard locales the unicode-default locale, and the
Specify that the default locale is initialized from the operating
environment in a system-specific way. (On Posix systems, it should
come from LANG.)
Network protocol agents working in known-to-be-restricted character
sets can then just go ahead and switch to the ascii locale. But the
default should be to just silently do the right thing for the current