Re: Surrogates and character representation

William D Clinger writes:
Per Bothner wrote:
> Random accesses to a position in a string that has not
> been previously accessed is not in itself useful.

In computational linguistics it is common to utilize standoff markup,
where features in a text are tagged in a separate file via character
ranges into the original. For example, we may have a file indicating
that certain prepositional phrases appear at offsets [25,40) and
[125,160) in the original file. I'm regularly dealing with
multimegabyte text files with such standoff markup and not having
random access is a detriment in these applications.

