[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



Tom Emerson scripsit:

> If you treat the surrogates as undefined within the character range,
> then you must (for consistency) treat all of the other undefined
> abstract characters as holes. This just complicates processing.

All other undefined codepoints are potentially definable: they correspond
to Unicode scalar values.  Surrogate codepoints are not definable and
don't correspond to any Unicode scalar value.  The difference is
architectural.

> One question I've had: how are 8-bit (i.e., byte) strings handled
> here? Is there no distinction between operations on raw bytes and
> operations on characters?

Those things are not strings: they are vectors of unsigned 8-bit integers.

-- 
John Cowan      jcowan@xxxxxxxxxxxxxxxxx        http://www.ccil.org/~cowan
        Is it not written, "That which is written, is written"?