[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Surrogates and character representation
Tom Emerson scripsit:
> If you treat the surrogates as undefined within the character range,
> then you must (for consistency) treat all of the other undefined
> abstract characters as holes. This just complicates processing.
All other undefined codepoints are potentially definable: they correspond
to Unicode scalar values. Surrogate codepoints are not definable and
don't correspond to any Unicode scalar value. The difference is
> One question I've had: how are 8-bit (i.e., byte) strings handled
> here? Is there no distinction between operations on raw bytes and
> operations on characters?
Those things are not strings: they are vectors of unsigned 8-bit integers.
John Cowan jcowan@xxxxxxxxxxxxxxxxx http://www.ccil.org/~cowan
Is it not written, "That which is written, is written"?