[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: "John.Cowan" <jcowan@xxxxxxxxxxxxxxxxx>
Subject: Re: Surrogates and character representation
From: Per Bothner <per@xxxxxxxxxxx>
Date: Sun, 24 Jul 2005 16:26:25 -0700
Cc: srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <20050724230151.GP2784@NYCMJCOWA2>
References: <1122002894.6607.29.camel@xxxxxxxxxxxxxx> <17120.28178.788826.533753@xxxxxxxxxxxxxxxxxxxxxx> <20050722040917.GB7576@NYCMJCOWA2> <17120.30080.768671.539970@xxxxxxxxxxxxxxxxxxxxxx> <878xzykn0y.fsf@xxxxxxxxxxxxxxxxx> <17122.31220.22073.72951@xxxxxxxxxxxxxxxxxxxxxx> <20050724053713.GM2784@NYCMJCOWA2> <42E3D086.90403@xxxxxxxxxxxxxxxx> <17123.54753.371934.424875@xxxxxxxxxxxxxxxxxxxxxx> <42E3DD0C.5060108@xxxxxxxxxxx> <20050724230151.GP2784@NYCMJCOWA2>
User-agent: Mozilla Thunderbird 1.0.6-1.1.fc4 (X11/20050720)

John.Cowan wrote:

Per Bothner scripsit:
It's the other way round. Using UTF-8 as in internal representation isjust fine for *applications*. The problem is that certain *API*s have aconcept of indexing into a string, and unfortunately R5RS is one ofthem. In itself indexing of strings is a useless feature, as it can bereplaced by a sequential-access cursor/iterator API - but historicallythe Scheme cursor/iterator API uses integers for the "cursor". Andexisting code moves the "cursor" forwards by adding 1.
By the same token, random-access disks are a useless feature, for they
can be replaced by sequential-access DECtapes that can be rewound and
selectively rewritten.  But at a price.

You're misunderstanding my point, perhaps because I was unclear. Thereare very few applications where you want to "getting the N'th record offile", in the sense the N is semantically meaningful. There are lots ofapplications where you want to get to a record fast, using random-accessgiven a "cookie": i.e. some way that the implementation can efficientlymap the cookie into the disk location of the record. The cookie may bethe disk address of the record, or its offset in a file, which may nothave any direct relationship to N, especially if you havevariable-length records.

Similarly, it is often useful to have random access in a long string,perhaps one representing an emacs buffer. However, you want toefficiently access sub-strings, not characters. Furthermore, you'reinterested in substrings defined in terms of previously-seen positions -or "marks" in the Emacs sense, not character indexes. E.g. thesubstring matching a regexp.

Specifically, can you think of any application where this suggestionwould lead to performance problems:

http://srfi.schemers.org/srfi-75/mail-archive/msg00050.html
--
	--Per Bothner
per@xxxxxxxxxxx   http://per.bothner.com/

References:
- Re: the "Unicode Background" section
  - From: Thomas Lord
- Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Thomas Bushnell BSG
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan
- Re: Surrogates and character representation
  - From: Alan Watson
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Per Bothner
- Re: Surrogates and character representation
  - From: John.Cowan

Prev by Date: Re: Surrogates and character representation
Next by Date: Re: A different approach
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread