[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

William D Clinger writes:
Per Bothner wrote:
> Random accesses to a position in a string that has not
> been previously accessed is not in itself useful.

In computational linguistics it is common to utilize standoff markup,
where features in a text are tagged in a separate file via character
ranges into the original. For example, we may have a file indicating
that certain prepositional phrases appear at offsets [25,40) and
[125,160) in the original file. I'm regularly dealing with
multimegabyte text files with such standoff markup and not having
random access is a detriment in these applications.

Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"