This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.
From: "Sergei Egorov" <esl@xxxxxxxxxxxxxxx> I understand your concern; many people do use ASCII and Latin-1 case mapping and are happy with what they get from the good old char-upcase and char-downcase. And I am not against char-upcase and char-downcase as long as their definition is limited to ASCII; otherwise you will have to ignore three problems mentioned in the Unicode book: uppercase I may map to either i or dotless i (in Turkish), two uppercase letters SS may map to a single lowercase sharp s in German, and this thing with French \'e. We are lucky that there are just three problems with case folding, but collation is *much* worse. My suggestion would be to restrict char-upcase, char-downcase, and their derivatives to ASCII and explicitly specify that string>? and other comparisons are based on mechanical code-point comparison that might not correspond to any 'natural' comparison in a real language. This approach makes the library reasonably useful, simple to implement, and really fast. I believe that attempting to define language-dependent interface to collation based on strings is wrong: collation works best when it deals with language-specific units larger than one character, and the 'text' abstraction suits this task much better. Wait wait wait -- I am *not* proposing CHAR-UPCASE and CHAR-DOWNCASE. These procedures are *not* part of SRFI-13. You are quite right -- they have real problems with non-ASCII char encodings. What is in SRFI-13 is STRING-UPCASE STRING-DOWNCASE STRING-TITLECASE These can handle the various issues involved in case-mapping text (e.g., upcasing German es-szet expanding to 2 chars, Greek sigma downcasing in a context-dependent way, titlecasing compound chars like "fi" or "dz"). No problem. Unicode TR 21 explains clearly and carefully how to do it for Unicode. Note also that I punted the side-effecting STRING-UPCASE! et al. because of the one-char->two-char case mapping issues. Your general point about these operations no longer being simply char->char, but being string->string or text->text is right on the money. However, I have nothing intelligent to say about collation and string comparison in the wide Unicode world today. If I can't come up with something reasonable that works in ASCII, Latin-1 *and* a Unicode setting, I'll punt the string-comparison functions, which I think would be a huge blow to the library. -Olin