This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.
Olin Shiver writes: [...] > - However, I think case-mapping and string-comparison are basic things, and > they can be given a generic, portable definition independent of the > underlying character encoding. Case-mapping does *not* require strings to be > well-formed text. ASCII, Latin-1 and Unicode all provide a clear, > language-independent definitions of this operation. > > I don't want the string library to be minimal. I want it to be useful. > People -- many of whom currently program with Latin-1 or ASCII Schemes -- > case-map and compare strings frequently. These operations can be provided > with an API which is portable across ASCII, Latin-1 and Unicode. So there's > no barrier here. I understand your concern; many people do use ASCII and Latin-1 case mapping and are happy with what they get from the good old char-upcase and char-downcase. And I am not against char-upcase and char-downcase as long as their definition is limited to ASCII; otherwise you will have to ignore three problems mentioned in the Unicode book: uppercase I may map to either i or dotless i (in Turkish), two uppercase letters SS may map to a single lowercase sharp s in German, and this thing with French \'e. We are lucky that there are just three problems with case folding, but collation is *much* worse. My suggestion would be to restrict char-upcase, char-downcase, and their derivatives to ASCII and explicitly specify that string>? and other comparisons are based on mechanical code-point comparison that might not correspond to any 'natural' comparison in a real language. This approach makes the library reasonably useful, simple to implement, and really fast. I believe that attempting to define language-dependent interface to collation based on strings is wrong: collation works best when it deals with language-specific units larger than one character, and the 'text' abstraction suits this task much better.