[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation



Tom Emerson scripsit:

> If you treat the surrogates as undefined within the character range,
> then you must (for consistency) treat all of the other undefined
> abstract characters as holes. This just complicates processing.

All other undefined codepoints are potentially definable: they correspond
to Unicode scalar values.  Surrogate codepoints are not definable and
don't correspond to any Unicode scalar value.  The difference is
architectural.

> One question I've had: how are 8-bit (i.e., byte) strings handled
> here? Is there no distinction between operations on raw bytes and
> operations on characters?

Those things are not strings: they are vectors of unsigned 8-bit integers.

-- 
John Cowan      jcowan@xxxxxxxxxxxxxxxxx        http://www.ccil.org/~cowan
        Is it not written, "That which is written, is written"?