This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
John Cowan's misunderstanding is critical (and TB's reply is well wide of the mark). John wrote: > SRFI-75 in no way prevents that. It simply says what string<? and > its friends mean. You can still provide string-uca-simple<? and > string-uca-locale<? if you want. No, S-75 does not "simply" do any such thing. Unfortunately. There are logical entanglements within the standard that mean you can't tweak the definition of string functions without simultaneously tweaking things like surface syntax. The code-point-wise and CaseMapping.txt-wise character and string functions are perfectly well defined --- perfect additions to the Scheme diamond. They are low level, true. They aren't the best way to process natural language Unicode text, sure. But they are also foundational to perfect Unicode support. That much is just fine and John is right about that. The problem that T.B. didn't call out where he should have there (though I'm pretty sure he knows about it) is that the character and string primitives are not really orthogonal to either the surface syntax of the language or to communication between scheme programs using the standard READ and WRITE procedures. To be more explicit: a) The definitions of the character and string functions must be consistent with the surface syntax of the language. (The language in the standard is a little weaselly on this point but that is the simplest interpretation and the one most consistent with Scheme's heritage of meta-circular programming techniques.) Therefore, if the character and string functions are "crude" with respect to natural language, then an implementation *can not* (cleanly, simply) allow identifier names which are globally-natural-language-friendly except in a crude way. We should have the goal of implementations which are not culturally biased -- implementations which support all languages equally or at least certain non-english languages perfectly. If we force identifiers to be crude in that way today, we can not achieve the larger goal tomorrow without breaking things. b) An analogous argument applies to the streams emitted and consumed by READ and WRITE. (This isn't *really* a separate point from (a) but people commonly treat it that way.) It's the surface syntax, not any vague notion of what is "encouraged" or what is the right name for this or that function that is in peril. It doesn't matter that programs using the standard string procedures currently specified aren't right for some natural language applications -- as John says, that's what new functions are for. The only problem with s-75 is the (one hopes unintended) implications it has for all future upward compatible syntaxes (for code and data, if you regard them as separate). -t