This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.
>... collation and string > comparison in the wide Unicode world today. If I can't come up with something > reasonable that works in ASCII, Latin-1 *and* a Unicode setting The STRING>? problem under Unicode differs from the problem under Latin-1 only in degree. (Finns and Swedes use a different collation sequence from Danes and Norwegians. "AE" is a ligated character in English, but not in Danish. Spanish vs. French vs Traditional Spanish. And much, much more.) Hence even under Latin-1, STRING>? must take the domain language into account. Unicode merely makes more scripts - and so more languages - convenient. Proposal: The string comparators take an optional final argument that is not of type string, but a new type, language-specifier (abbrev. langid), which specifies the language of a block of text. The procedure CURRENT-LANGUAGE returns the langid for whatever language Scheme uses for string comparators lacking this optional final argument. Scheme initially uses some default langid that it inherits from its host environment; the procedure DEFAULT-LANGUAGE returns the langid for this default. The procedures CALL-WITH-LANGUAGE <i>langid proc</i> and WITH-LANGUAGE <i>langid thunk</i> change the value returned by CURRENT-LANGUAGE. Finally, the procedure LANGUAGE takes the ISO 639 language code, specified as a string, and returns the correct langid. LANGUAGE may be extended to take other values (perhaps a numeric language code from the host OS). This would allow correct collation of text using the current Scheme notion of "string." Building a higher-level "text" abstraction from this is purely mechanical. Ben