This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Alex Shinn <alexshinn@xxxxxxxxx> writes: > It would be nice to provide at least a place-holder for locales, but > this does open another can of worms. What is a locale? In the > implementation I provide for Chicken and Gauche it's just a string, > but some schemes might want locale objects. Furthermore, there's > probably a (current-locale). I don't think this has any worms at all. (current-locale) or current-locale is fine; I'm not sure which is right. But we don't need to standardize it. So don't! Just provide some guaranteed standard locale values if you want them. > Given that, does > > (string-ci=? s1 s2) > > mean the same thing as > > (string-ci=? s1 s2 (current-locale)) > > or the same as > > (string-ci=? s1 s2 (independent-locale)) It should be current-locale by default, without any doubt whatsoever. I said this earlier in the thread, but I think it got lost. I do not envision simultaneously using different encodings inside one character set. My vision of a fancy-ass Unicode compliant Scheme system would have it that "character" is a unicode character. ASCII characters are not unicode characters; they would be probably just integers or octets or what-have-you. The problem here is *precisely* that people are thinking "operating on a series of octets" is the same basic thing as "operating on text". That C Programmer thinking. :) An incoming email message is not a series of characters. An ISO Latin-1 encoded file is not a series of characters. Both of these are series of *octets*, strings of *bytes*. And there is a mapping necessary to turn them into a series of characters. In the case of the email message, the headers are supposed to be in ASCII, with some embedded mappings allowed, and the body is in a mapping specified by an tag in the headers. Reading such a message is *not* a matter of taking the octets, turning them into characters with integer->char, and then operating on the resulting "string". No. It's a matter of taking the octets, and *interpreting* them, indeed, *translating* them into strings. And the strings you get at the end of that operation are *unicode* strings. That's the kind of system I think I want. I certainly don't expect it to be mandated, but I don't want it prohibited either. It gets prohibited the instant you start requiring operations on *characters* which only make sense for this or that *encoding*. I imagine a function (ascii->string ....) which takes an array of octets and returns a string. A string of *characters*, each of which is a Unicode character. There can be (latin-1->string ...) and (latin-2->string ...) and so forth too. If you want special functions to operate on ascii, that's fine. But ASCII is an encoding, so ASCII-operating functions should operate only on encodings. If you want (ascii-upcase ...) which takes an *integer* and returns another *integer* I don't object, though I will request language to make clear this is for specialized uses, and doing things like (integer->char (ascii-upcase (char->integer FOO))) is almost certainly wrong, telling people to use string-upcase instead. This I think makes the most sense. It's the Right Thing for Unicode if you really want to go whole hog and do it all. And Scheme standards should not be written in such a way that it's essentially impossible. The chief obstacle is the tedious writing of functions that "everyone wants" but which preclude the use of the character type to represent Unicode characters. Thomas