This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 are here. Eventually, the entire history will be moved there, including any new messages.
Alex Shinn <alexshinn@xxxxxxxxx> writes: > Because the proper definition is so complicated and slow, yet there > are many uses of strict ASCII case mapping in computer languages > and protocols, I think it makes sense to define the core case-mapping > procedures as ASCII-specific. Full linguistic case-handling should be > provided by specialized library procedures which optionally accept locale, > and only work at the string level, since single-char case-mappings are > ill-defined. I agree with almost everything you say, and then you say this part, with which I earnestly disagree. As you say, we do not want 90% support in the core language, but defining ASCII-specific procedures like this is exactly a 90% solution. Actually, more like 70%, if that. We should provide case-mapping procedures with optional locale arguments. The omission of a locale argument should be the same as giving "current-locale" or "(current-locale)" or some other global reference of the sort. We should require a locale which represents the case mappings in UnicodeData.txt and SpecialCasing.txt. If we want another locale which represents "ASCII-specific" case mapping, fine. But don't make that normal! If you have two ways to do things, one standard and always-supported, and one which is optional and loose, then everyone will use the standard as a matter of course. So don't make the standard the Wrong Thing. ASCII-specific case mappings are the Wrong Thing. It would be much better to have no case mapping procedures in the standard at all than to have half-assed ones. I would therefore strongly urge the following approach: All case-related procedures take an optional locale argument. If the locale argument is omitted, it defaults to some global variable (or the return value of a global procedure). Specify as standard locales the unicode-default locale, and the ascii-only locale. Specify that the default locale is initialized from the operating environment in a system-specific way. (On Posix systems, it should come from LANG.) Network protocol agents working in known-to-be-restricted character sets can then just go ahead and switch to the ascii locale. But the default should be to just silently do the right thing for the current locale. Thomas