[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: Tom Emerson <tree@xxxxxxxxxxxxx>
Subject: Re: Surrogates and character representation
From: bear <bear@xxxxxxxxx>
Date: Thu, 28 Jul 2005 01:24:15 -0700 (PDT)
Cc: Per Bothner <per@xxxxxxxxxxx>, srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <17128.22540.135687.288180@xxxxxxxxxxxxxxxxxxxxxx>
References: <y9lu0ig46v8.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <17127.44572.207464.724852@xxxxxxxxxxxxxxxxxxxxxx> <5fb7e0870507271853a6defce@xxxxxxxxxxxxxx> <17128.19464.258589.23946@xxxxxxxxxxxxxxxxxxxxxx> <5fb7e08705072720162f6a8d1a@xxxxxxxxxxxxxx> <17128.20298.707693.881280@xxxxxxxxxxxxxxxxxxxxxx> <42E8546F.9000407@xxxxxxxxxxx> <17128.22540.135687.288180@xxxxxxxxxxxxxxxxxxxxxx>

On Wed, 27 Jul 2005, Tom Emerson wrote:

>Per Bothner writes:
>> If you have the luxury of reading your entire file into memory (and in
>> the process expanding its size by a good bit) you can of course do all
>> kinds of processing and index-building.
>
>I have text files containing 100MB worth of UTF-8 encoded text with
>character offsets in supplemental files. This happens regularly in
>corpus linguistics.

Uh, seconded.  Same reason (corpus linguistics).  There is no
practical way to keep track of "marks" for hundreds of thousands
(or millions) of interlinear annotations, and be able to serialize
the string and read it back with marks intact. Numeric offsets do
a better, more natural job.

				Bear

Follow-Ups:
- Re: Surrogates and character representation
  - From: Shiro Kawai

References:
- Re: Surrogates and character representation
  - From: William D Clinger
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Alex Shinn
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Alex Shinn
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: Per Bothner
- Re: Surrogates and character representation
  - From: Tom Emerson

Prev by Date: Re: Surrogates and character representation
Next by Date: Re: Surrogates and character representation
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread