[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

Tom Emerson scripsit:

> Surrogates are a side-effect of UTF-16. Period. Application-level code
> just doesn't see them. This entire discussion about whether or not a
> CHAR should include surrogate code points is, IMHO, a waste of
> everyones talents here. It's much ado about nothing.

I agree that applications developers rarely have to think about surrogates,
but language/library designers (whose job it is to make corner cases
unsuprising) do have to think about them.

FWIW, I now think (after some talk on a private Unicode list) that it's
correct to allow surrogates as Scheme characters; that is, the range of
char->integer should be 0 to #x10FFFF.

John Cowan  jcowan@xxxxxxxxxxxxxxxxx  www.reutershealth.com  www.ccil.org/~cowan
It's the old, old story.  Droid meets droid.  Droid becomes chameleon. 
Droid loses chameleon, chameleon becomes blob, droid gets blob back
again.  It's a classic tale.  --Kryten, Red Dwarf