[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

On Saturday 23 July 2005 00:19, Thomas Bushnell BSG wrote:
> Tom Emerson <tree@xxxxxxxxxxxxx> writes:
> > Surrogate codepoints have a character property. They should be usable
> > in a string, and individually can be considered a character.
> This is exactly part of the reason why char=codepoint is such a lose.
> Most code doesn't *want* to see this kind of garbage; it's an encoding
> issue.  I want chars where the *computer* takes care of the coding.  I
> want chars that are fully-understood characters, not little pieces of
> a character.

This points out a tension underlying this thread.

There are two dicsussions intertwined here.  [1] The access to and use of 
Unicode within Scheme (e.g. to process internationalized web pages) and [2] 
bringing Unicode into Scheme (extending Symbol & String datatypes).  

SRFI-75 specifically addresses the second of these goals and (wisely) states 
that the first goal is left to another SRFI.

I for one would be satisfied to be able to portably manipulate Unicode using 
Scheme source encoded in ASCII (or UTF-8). In particular, I would be willing 
use have a separate datatype (or datatypes) and libraries to accomplish this.

Would anyone care to post a Unicode Encoding & I/O SRFI, so that the *other* 
discussion can be moved from this thread to that one?