[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

John.Cowan writes:
> > Surrogates are a side-effect of UTF-16. Period. Application-level code
> > just doesn't see them. This entire discussion about whether or not a
> > CHAR should include surrogate code points is, IMHO, a waste of
> > everyones talents here. It's much ado about nothing.
> I agree that applications developers rarely have to think about surrogates,
> but language/library designers (whose job it is to make corner cases
> unsuprising) do have to think about them.

I disagree that Surrogates are a corner case. Do nothing with them and
encountering an unpaired surrogate in a string is no different than
encountering #xFFFE. Heck, even encountering paired surrogates in a
string is semantically meaningless but valid.

> FWIW, I now think (after some talk on a private Unicode list) that it's
> correct to allow surrogates as Scheme characters; that is, the range of
> char->integer should be 0 to #x10FFFF.

The arguments that Ken and Mark made there to change your mind may be
worth summarizing here.

Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"