[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: srfi-75-request@xxxxxxxxxxxxxxxxx
Subject: Re: Surrogates and character representation
From: Ken Dickey <Ken.Dickey@xxxxxxxxxxxxxx>
Date: Sat, 23 Jul 2005 17:05:55 +0200 (DFT)
Cc: Thomas Bushnell BSG <tb@xxxxxxxxxx>, tree@xxxxxxxxxxxxx, "John.Cowan" <jcowan@xxxxxxxxxxxxxxxxx>
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: sperber@xxxxxxxxxxxxxxxxxxxxxxxxxxx
Delivered-to: srfi-admin@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75-request@xxxxxxxxxxxxxxxxx
In-reply-to: <878xzykn0y.fsf@xxxxxxxxxxxxxxxxx>
Old-date: Sat, 23 Jul 2005 07:07:09 -0700
Organization: BitWize Consulting
References: <1122002894.6607.29.camel@xxxxxxxxxxxxxx> <17120.30080.768671.539970@xxxxxxxxxxxxxxxxxxxxxx> <878xzykn0y.fsf@xxxxxxxxxxxxxxxxx>
Resent-date: Sun, 24 Jul 2005 11:35:21 +0200
Resent-from: Michael Sperber <sperber@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Resent-message-id: <y9lek9o8s2u.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Resent-to: srfi-75@xxxxxxxxxxxxxxxxx
User-agent: KMail/1.7

On Saturday 23 July 2005 00:19, Thomas Bushnell BSG wrote:
> Tom Emerson <tree@xxxxxxxxxxxxx> writes:
> > Surrogate codepoints have a character property. They should be usable
> > in a string, and individually can be considered a character.
>
> This is exactly part of the reason why char=codepoint is such a lose.
> Most code doesn't *want* to see this kind of garbage; it's an encoding
> issue.  I want chars where the *computer* takes care of the coding.  I
> want chars that are fully-understood characters, not little pieces of
> a character.

This points out a tension underlying this thread.

There are two dicsussions intertwined here.  [1] The access to and use of 
Unicode within Scheme (e.g. to process internationalized web pages) and [2] 
bringing Unicode into Scheme (extending Symbol & String datatypes).  

SRFI-75 specifically addresses the second of these goals and (wisely) states 
that the first goal is left to another SRFI.

I for one would be satisfied to be able to portably manipulate Unicode using 
Scheme source encoded in ASCII (or UTF-8). In particular, I would be willing 
use have a separate datatype (or datatypes) and libraries to accomplish this.

Would anyone care to post a Unicode Encoding & I/O SRFI, so that the *other* 
discussion can be moved from this thread to that one?

$0.02,
-KenD

Follow-Ups:
- Re: Surrogates and character representation
  - From: Michael Sperber

References:
- Re: the "Unicode Background" section
  - From: Thomas Lord
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Thomas Bushnell BSG

Prev by Date: Re: Surrogates and character representation
Next by Date: Re: Surrogates and character representation
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread