[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: "John.Cowan" <jcowan@xxxxxxxxxxxxxxxxx>
Subject: Re: Surrogates and character representation
From: Tom Emerson <tree@xxxxxxxxxxxxx>
Date: Sun, 24 Jul 2005 09:25:29 -0400
Cc: Thomas Bushnell BSG <tb@xxxxxxxxxx>, srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <20050724053713.GM2784@NYCMJCOWA2>
References: <1122002894.6607.29.camel@xxxxxxxxxxxxxx> <17120.28178.788826.533753@xxxxxxxxxxxxxxxxxxxxxx> <20050722040917.GB7576@NYCMJCOWA2> <17120.30080.768671.539970@xxxxxxxxxxxxxxxxxxxxxx> <878xzykn0y.fsf@xxxxxxxxxxxxxxxxx> <17122.31220.22073.72951@xxxxxxxxxxxxxxxxxxxxxx> <20050724053713.GM2784@NYCMJCOWA2>
Reply-to: tree@xxxxxxxxxxxxx

John.Cowan writes:
> > Surrogates are a side-effect of UTF-16. Period. Application-level code
> > just doesn't see them. This entire discussion about whether or not a
> > CHAR should include surrogate code points is, IMHO, a waste of
> > everyones talents here. It's much ado about nothing.
> 
> I agree that applications developers rarely have to think about surrogates,
> but language/library designers (whose job it is to make corner cases
> unsuprising) do have to think about them.

I disagree that Surrogates are a corner case. Do nothing with them and
encountering an unpaired surrogate in a string is no different than
encountering #xFFFE. Heck, even encountering paired surrogates in a
string is semantically meaningless but valid.

> FWIW, I now think (after some talk on a private Unicode list) that it's
> correct to allow surrogates as Scheme characters; that is, the range of
> char->integer should be 0 to #x10FFFF.

The arguments that Ken and Mark made there to change your mind may be
worth summarizing here.

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"

References:
- Re: the "Unicode Background" section
  - From: Thomas Lord
- Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Thomas Bushnell BSG
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan

Prev by Date: Re: Surrogates and character representation
Next by Date: Re: Surrogates and character representation
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread