[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: constant-time access to variable-width encodings

There are lots of possibilities to implement strings of characters such
that the interface can pretend a string is just a vector of Unicode Scalar
Values, while representing most characters by 8 bits if possible.

One of the alternatives is representing strings in pieces that are
homogeneous with respect to whether they use 1, 2, or 3 bytes per char.
When string-set! modifies a character such that its width increases,
the string is either split further, or an entire section of the string is replaced
by a wider representation. The access time becomes O(log |string|),
eventually but with very small constants.

This is a matter of personal preference, but for Scheme I would rather
like to see more advanced implementations of strings with a simple
interface than simple implementations with a complicated interface.
The most simple interface I can think of is "string = vector of char".

In fact, I would even prefer if the default 'string' type in Scheme would
behave like a type of constants---exactly like 'integer' which involves a
fair amount of magic behind the scene but is conceptually simple and
often amazingly efficient.