[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case mappings



Alex Shinn <alexshinn@xxxxxxxxx> writes:

> Because the proper definition is so complicated and slow, yet there
> are many uses of strict ASCII case mapping in computer languages
> and protocols, I think it makes sense to define the core case-mapping
> procedures as ASCII-specific.  Full linguistic case-handling should be
> provided by specialized library procedures which optionally accept locale,
> and only work at the string level, since single-char case-mappings are
> ill-defined.

I agree with almost everything you say, and then you say this part,
with which I earnestly disagree.  As you say, we do not want 90%
support in the core language, but defining ASCII-specific procedures
like this is exactly a 90% solution.  Actually, more like 70%, if
that.

We should provide case-mapping procedures with optional locale
arguments.  The omission of a locale argument should be the same as
giving "current-locale" or "(current-locale)" or some other global
reference of the sort.

We should require a locale which represents the case mappings in
UnicodeData.txt and SpecialCasing.txt.  If we want another locale
which represents "ASCII-specific" case mapping, fine.  But don't make
that normal!

If you have two ways to do things, one standard and always-supported,
and one which is optional and loose, then everyone will use the
standard as a matter of course.  So don't make the standard the Wrong
Thing.  ASCII-specific case mappings are the Wrong Thing.

It would be much better to have no case mapping procedures in the
standard at all than to have half-assed ones.

I would therefore strongly urge the following approach:

All case-related procedures take an optional locale argument.
If the locale argument is omitted, it defaults to some global variable 
  (or the return value of a global procedure).
Specify as standard locales the unicode-default locale, and the
  ascii-only locale.
Specify that the default locale is initialized from the operating
  environment in a system-specific way.  (On Posix systems, it should
  come from LANG.)

Network protocol agents working in known-to-be-restricted character
sets can then just go ahead and switch to the ascii locale.  But the
default should be to just silently do the right thing for the current
locale.

Thomas