[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: jcowan@xxxxxxxxxxxxxxxxx
Subject: Re: Surrogates and character representation
From: Shiro Kawai <shiro@xxxxxxxx>
Date: Sat, 23 Jul 2005 22:14:56 -1000 (HST)
Cc: srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <20050724053713.GM2784@NYCMJCOWA2>
References: <878xzykn0y.fsf@xxxxxxxxxxxxxxxxx> <17122.31220.22073.72951@xxxxxxxxxxxxxxxxxxxxxx> <20050724053713.GM2784@NYCMJCOWA2>

From: "John.Cowan" <jcowan@xxxxxxxxxxxxxxxxx>
Subject: Re: Surrogates and character representation
Date: Sun, 24 Jul 2005 01:37:13 -0400

> but language/library designers (whose job it is to make corner cases
> unsuprising) do have to think about them.

Yes, but such library is working on the different domains.
Suppose the library has a function ucs->utf8.  It accepts a character,
and returns a sequence of octets, e.g.
  (ucs->utf8 #\u3042) => (#xe3 #x81 #x82)
If it returns (#\u00e3 #\u0081 #\u0082), I'd say there's something
wrong in it, it mixes up the domain and the range.
The same is true on ucs->utf16: It's type should be Char -> [Int16],
and unpaired surrogates appears as Int16.

The implementation can have #\ud800, as far as it defines the
behavior of expressions such as (ucs->utf16 #\ud800) or
(string-append "\ud800" "\udc00"), as well as I/O.   If we have
it in the standard, the standard should give definitions for those
expressions.   Do you think there's an agreeable and consistent
definition on handling these "characters"?  If not, it's better
to leave it unspecified.

(BTW, I am using a weird Scheme system that allows such invalid
"characters" in a string, and sometines it is handy, but it is ugly.)

--shiro

References:
- Re: Surrogates and character representation
  - From: Thomas Bushnell BSG
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan

Prev by Date: Re: Surrogates and character representation
Next by Date: Re: Surrogates and character representation
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread