[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

John.Cowan writes:
> > Surrogates are a side-effect of UTF-16. Period. Application-level code
> > just doesn't see them. This entire discussion about whether or not a
> > CHAR should include surrogate code points is, IMHO, a waste of
> > everyones talents here. It's much ado about nothing.
> I agree that applications developers rarely have to think about surrogates,
> but language/library designers (whose job it is to make corner cases
> unsuprising) do have to think about them.

I disagree that Surrogates are a corner case. Do nothing with them and
encountering an unpaired surrogate in a string is no different than
encountering #xFFFE. Heck, even encountering paired surrogates in a
string is semantically meaningless but valid.

> FWIW, I now think (after some talk on a private Unicode list) that it's
> correct to allow surrogates as Scheme characters; that is, the range of
> char->integer should be 0 to #x10FFFF.

The arguments that Ken and Mark made there to change your mind may be
worth summarizing here.

Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"