[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: tree@xxxxxxxxxxxxx
Subject: Re: Surrogates and character representation
From: Per Bothner <per@xxxxxxxxxxx>
Date: Sun, 24 Jul 2005 11:25:16 -0700
Cc: srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <17123.54753.371934.424875@xxxxxxxxxxxxxxxxxxxxxx>
References: <1122002894.6607.29.camel@xxxxxxxxxxxxxx> <17120.28178.788826.533753@xxxxxxxxxxxxxxxxxxxxxx> <20050722040917.GB7576@NYCMJCOWA2> <17120.30080.768671.539970@xxxxxxxxxxxxxxxxxxxxxx> <878xzykn0y.fsf@xxxxxxxxxxxxxxxxx> <17122.31220.22073.72951@xxxxxxxxxxxxxxxxxxxxxx> <20050724053713.GM2784@NYCMJCOWA2> <42E3D086.90403@xxxxxxxxxxxxxxxx> <17123.54753.371934.424875@xxxxxxxxxxxxxxxxxxxxxx>
User-agent: Mozilla Thunderbird 1.0.6-1.1.fc4 (X11/20050720)

Tom Emerson wrote:

Representing strings internally in UTF-8 is a loss though, since you
lose random access to the string.

Random access to a previously accessed position works just fine - justuse the byte offset.

Random accesses to a position in a string that has not been previouslyaccessed is not in itself useful.

For some applications this isn't a big deal, but in general using UTF-8

> as an internal representation is a bad idea.

It's the other way round. Using UTF-8 as in internal representation isjust fine for *applications*. The problem is that certain *API*s have aconcept of indexing into a string, and unfortunately R5RS is one ofthem. In itself indexing of strings is a useless feature, as it can bereplaced by a sequential-access cursor/iterator API - but historicallythe Scheme cursor/iterator API uses integers for the "cursor". Andexisting code moves the "cursor" forwards by adding 1.

--
	--Per Bothner
per@xxxxxxxxxxx   http://per.bothner.com/

Follow-Ups:
- Re: Surrogates and character representation
  - From: John.Cowan

References:
- Re: the "Unicode Background" section
  - From: Thomas Lord
- Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Thomas Bushnell BSG
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan
- Re: Surrogates and character representation
  - From: Alan Watson
- Re: Surrogates and character representation
  - From: Tom Emerson

Prev by Date: Re: Surrogates and character representation
Next by Date: Re: Surrogates and character representation
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread