[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character strings versus byte strings

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.



bear <bear@xxxxxxxxx> writes:

> Each character is a unicode codepoint plus a non-defective sequence of
> unicode combining codepoints.  The unicode documentation refers to these
> entities as "graphemes."

I should revise what I said; there may well be a case for Scheme
characters being graphemes instead of codepoints.  I lead toward
codepoints, but I recognize that graphemes are a good contender.

My post was intended to argue against UTF-8; but moving further up the
abstraction ladder than codepoints may well be right.

Thomas