This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.
Tom Lord <lord@xxxxxxx> writes: > > Wrong. A Scheme character should be a codepoint. The representation > > of code points as sequences of bytes should be under the hood. > > Misleading. > > It isn't obvious that Scheme characters should be _Unicode_ > codepoints. For (much) more inclusive definitions of "codepoint", > that characters should be codepoints is tautologically true. Fair enough, though I think Unicode is the best choice at present. It might be perfectly fine to leave that agnostic too. (If you don't want specify even Unicode, then you certainly can't specify UTF-8!) > There's a serious problem regarding Scheme and Unicode in that, for > any sane definition of "character" in Unicode, the character type in > R5RS is not sanely isomorphic. I think there is a problem in that the R5RS character functions are simply too simplistic, most notably in the case-mapping functions. Case-mapping is a locale-dependent task; however difficult that may make the world, it's a fact of the world. Many many many computer systems could get away with ignoring the locale-dependency of case-mapping, but now they can no longer plead ignorance. (Though the problems are hardly obscure; even German causes problems.) I would like to see Scheme DTRT, which means not creating a foolish oversimplification. We have finally gotten away from oversimplifying numbers; it's time to stop oversimplifying characters too. We are stuck with R5RS at present, but we should at least not make things worse. Ok, off that soapbox: I am happy to let others hash out the actual topic of this SRFI. My concern is that the SRFI not start constraining Scheme in a bad way, and if you start saying things like "Scheme strings are UTF-8", I start to get *really* nervous that someone is going to start making a single codepoint take up multiple elements in a Scheme string. Thomas