[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character strings versus byte strings



bear <bear@xxxxxxxxx> writes:

> Each character is a unicode codepoint plus a non-defective sequence of
> unicode combining codepoints.  The unicode documentation refers to these
> entities as "graphemes."

I should revise what I said; there may well be a case for Scheme
characters being graphemes instead of codepoints.  I lead toward
codepoints, but I recognize that graphemes are a good contender.

My post was intended to argue against UTF-8; but moving further up the
abstraction ladder than codepoints may well be right.

Thomas