This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.
> Tom Lord wrote: >> [*] What exactly is a "Unicode character?" The answer can vary >> depending on context. In some contexts it might mean a Unicode >> abstract character -- the kind of value to which a codepoint >> (integer in the range 0..10ffff) is assigned. In other contexts, >> it may mean certain kinds of sequences of abstract characters. >> >> One goal for SRFI-52 is to remain agnostic about the answer >> to that question. Robby Findler wrote: > I'm still relatively new to unicode, so I apologize if this is a > foolish question (rtfm ptrs welcome!), but I wonder why you would want > to remain agnostic on this point. Can you explain why unicode-code > points would be a bad choice, and what other choices might exist? Short version: In general, a single character on your screen may actually be made of several Unicode code points. For example, the grapheme[*] é (small E with acute accent) can be encoded as a base character (small E) plus a combining mark (acute accent). Most internal Unicode encodings use code points as the basic "character" unit. In those systems, the letter é is one symbol on screen but two "character" units in memory. Other systems combine the code points much earlier, such that é is only one "character" unit both on-screen and in-memory. (For example, Bear's scheme stores characters as bignums with each code point stored as a "big digit.") There are advantages and disadvantages to both approaches. The "unit is code point" method makes string indexing and mutation more difficult, and it makes procedures like char-upcase nonsensical (because a character is only a partial thing, in general). The "unit is grapheme" approach avoids most of that -- although letters like ß are still a problem for case-folding -- but generally requires more space to store the same data. [*] "Grapheme" is the name for "what humans think of when you talk about characters," more or less. -- Bradd W. Szonye http://www.szonye.com/bradd