[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: William D Clinger <cesura@xxxxxxxxxxx>
Subject: Re: Surrogates and character representation
From: Alex Shinn <alexshinn@xxxxxxxxx>
Date: Thu, 28 Jul 2005 16:25:24 +0900
Cc: srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=MM1fEG9rUk5+A3uL8frPHCtkX3YP6YXdFHuwogcoHZRVSGFsojPhSycAP0sW/sSyaBb/MmZ3Qa0nhHtXUhu3E3KXD7BpCQqw3qiKYIXRjuyPXYzKzzS5Kgx/3nTAtP66KLxZVt95MsLotYqN3Ni6pfETYb1bA1mNDpmPCxansU4=
In-reply-to: <y9lk6jb4ho8.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <y9lk6jb4ho8.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Reply-to: Alex Shinn <alexshinn@xxxxxxxxx>

On 7/28/05, William D Clinger <cesura@xxxxxxxxxxx> wrote:
> 
> You certainly don't need character offsets to do a string
> search, but the naive algorithm without random access to
> characters is O(mn).  The Boyer-Moore algorithm improves
> this to O(n/m) in many cases.  I believe one can construct
> artificial examples to show that some O(n/m) cases would
> degrade to an intermediate complexity, or even back to O(mn),
> in UTF-8 or UTF-16 without character offsets.

This is not correct.  Any search algorithm that works on bytes
will work on on UTF-8 strings.  That is, given a C function that
searches for a char* within a char* (e.g. strstr) then that will
return the correct result if the arguments are UTF-8 encoded,
no matter what algorithm is used.

It is in fact UTF-32 that has additional overhead for Boyer-Moore
searches as mentioned in my previous mail.

-- 
Alex

References:
- Re: Surrogates and character representation
  - From: William D Clinger

Prev by Date: Re: Surrogates and character representation
Next by Date: Re: Surrogates and character representation
Previous by thread: Re: Surrogates and character representation
Next by thread: Re: Surrogates and character representation
Index(es):
- Date
- Thread