This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 are here. Eventually, the entire history will be moved there, including any new messages.
John Cowan <cowan@xxxxxxxx> writes: > bear scripsit: > >> I feel that "unicode default grapheme clusters" more closely >> map to what users call "characters" than codepoints do. In >> the interests of keeping the abstractions used by the programmer >> as close as possible to the abstractions used by ordinary users, >> I therefore support defining scheme characters as DCG's. > > While I disagree with this position, it is entirely coherent and > consistent, and I wouldn't weep if R6RS went with it. Ditto. > The main argument against with it is that the "user" view of > characters can cross DCG boundaries. From the codepoint level, > other levels can be built up, including the DCG, syllable, word, > sentence, and paragraph. I wouldn't mind if R6RS went with code points, either. In either case, what I would really like to see, though, is the end of using confusing terms, such as "character". R6RS could very well be using code points as the smallest units of strings, but then should call it code point, not character. The only reason why we're still talking about "character" is backwards compatibility. I can, without confusion, talk about Latin-1 "code points", but talking about Unicode "characters" (meaning code points) is broken. Regards, -- Jorgen -- ((email . "forcer@xxxxxxxxx") (www . "http://www.forcix.cx/") (gpg . "1024D/028AF63C") (irc . "nick forcer on IRCnet"))