[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: bear <bear@xxxxxxxxx>
Subject: Re: Surrogates and character representation
From: Alan Watson <a.watson@xxxxxxxxxxxxxxxx>
Date: Fri, 29 Jul 2005 10:33:43 -0500
Cc: Shiro Kawai <shiro@xxxxxxxx>, tree@xxxxxxxxxxxxx, per@xxxxxxxxxxx, srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <Pine.LNX.4.58.0507281512010.22175@xxxxxxxxxxxxxx>
Organization: Centro de Radioastronomía y Astrofísica UNAM
References: <42E8546F.9000407@xxxxxxxxxxx> <17128.22540.135687.288180@xxxxxxxxxxxxxxxxxxxxxx> <Pine.LNX.4.58.0507280119280.28883@xxxxxxxxxxxxxx> <20050728.000652.1016278026.shiro@xxxxxxxx> <42E90FA0.6070005@xxxxxxxxxxxxxxxx> <Pine.LNX.4.58.0507281512010.22175@xxxxxxxxxxxxxx>
User-agent: Mozilla Thunderbird 1.0 (X11/20050317)

bear wrote:

(1) Are your "random" accesses into your corpus linguistics strings
really random, do they have significant locality, or could they be
arranged to have have significant locality?



Speaking for myself, I would say they are as close to random as

makes no difference.


Thanks for your answer.

I think I'm convinced that representing strings in plain UTF-8 is alosing representation for this application. Or, generalizing, thisapplication really needs strings that have constant-time random accessand not just linear-time traversal.

If I wanted to rescue UTF-8 (because I really really really want to keepconversion to UTF-8 as a constant-time operation), I could maintain avector of byte offsets to every Nth character.


Regards,

Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México

References:
- Re: Surrogates and character representation
  - From: Per Bothner
- Re: Surrogates and character representation
  - From: Tom Emerson
- Re: Surrogates and character representation
  - From: bear
- Re: Surrogates and character representation
  - From: Shiro Kawai
- Re: Surrogates and character representation
  - From: Alan Watson
- Re: Surrogates and character representation
  - From: bear

Prev by Date: Re: freshman-level Boyer-Moore fast string search
Next by Date: Normalization vs. grapheme clusters
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread