[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Surrogates and character representation
Tom Emerson wrote:
Representing strings internally in UTF-8 is a loss though, since you
lose random access to the string.
Random access to a previously accessed position works just fine - just
use the byte offset.
Random accesses to a position in a string that has not been previously
accessed is not in itself useful.
For some applications this isn't a big deal, but in general using UTF-8
> as an internal representation is a bad idea.
It's the other way round. Using UTF-8 as in internal representation is
just fine for *applications*. The problem is that certain *API*s have a
concept of indexing into a string, and unfortunately R5RS is one of
them. In itself indexing of strings is a useless feature, as it can be
replaced by a sequential-access cursor/iterator API - but historically
the Scheme cursor/iterator API uses integers for the "cursor". And
existing code moves the "cursor" forwards by adding 1.