[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: "John.Cowan" <jcowan@xxxxxxxxxxxxxxxxx>
Subject: Re: Surrogates and character representation
From: Alan Watson <a.watson@xxxxxxxxxxxxxxxx>
Date: Mon, 25 Jul 2005 12:23:44 -0500
Cc: Per Bothner <per@xxxxxxxxxxx>, srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <20050724230151.GP2784@NYCMJCOWA2>
Organization: Centro de Radioastronomía y Astrofísica UNAM
References: <1122002894.6607.29.camel@xxxxxxxxxxxxxx> <17120.28178.788826.533753@xxxxxxxxxxxxxxxxxxxxxx> <20050722040917.GB7576@NYCMJCOWA2> <17120.30080.768671.539970@xxxxxxxxxxxxxxxxxxxxxx> <878xzykn0y.fsf@xxxxxxxxxxxxxxxxx> <17122.31220.22073.72951@xxxxxxxxxxxxxxxxxxxxxx> <20050724053713.GM2784@NYCMJCOWA2> <42E3D086.90403@xxxxxxxxxxxxxxxx> <17123.54753.371934.424875@xxxxxxxxxxxxxxxxxxxxxx> <42E3DD0C.5060108@xxxxxxxxxxx> <20050724230151.GP2784@NYCMJCOWA2>
User-agent: Mozilla Thunderbird 1.0 (X11/20050317)

By the same token, random-access disks are a useless feature, for they
can be replaced by sequential-access DECtapes that can be rewound and
selectively rewritten.  But at a price.

Files actually provide a fairly close analogy to the commonest means ofrepresenting Unicode strings.

Imagine a file system that implements files as streams of bytes. Nowimagine that you want to read the Nth *line*. The only way to do this isto read through the file until you have encounted N-1 newlines. This islike finding the Nth character when using UTF-8 for strings.

Now imagine a file system that implements files as enumeratedrandom-access records and uses exactly one record for each line. You candirectly read the Nth line. This is like finding the Nth character whenusing UCS-32 for strings.

Now imagine a file system that implements files as enumeratedrandom-access records and uses one or more record for each line. This islike using UTF-16 for strings.


Regards,

Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México

Follow-Ups:
- Re: Surrogates and character representation
  - From: bear

References:
- Re: the "Unicode Background" section
  - From: Thomas Lord
- Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Thomas Bushnell BSG
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: John.Cowan
- Re: Surrogates and character representation
  - From: Alan Watson
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Per Bothner
- Re: Surrogates and character representation
  - From: John.Cowan

Prev by Date: Re: A different approach
Next by Date: Re: Does this change identifiers (variables) or only symbols?
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread