[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation



Alan Watson scripsit:

> Hmm. That would seem to prevent an implementation representing strings 
> internally using UTF-8. This is convenient in some contexts as Scheme 
> strings can be trivially converted to UTF-8 C strings.

Not at all.  There is a well-defined UTF-8 encoding for every Unicode
code point (which is not the case for UTF-16).  See Table 3-6 in
the Unicode Standard 4.0.

-- 
Here lies the Christian,                        John Cowan
        judge, and poet Peter,                  http://www.reutershealth.com
Who broke the laws of God                       http://www.ccil.org/~cowan
        and man and metre.                      jcowan@xxxxxxxxxxxxxxxxx