[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: constant-time access to variable-width encodings
There are lots of possibilities to implement
strings of characters such
that the interface can pretend a string
is just a vector of Unicode Scalar
Values, while representing most characters
by 8 bits if possible.
One of the alternatives is representing
strings in pieces that are
homogeneous with respect to whether
they use 1, 2, or 3 bytes per char.
When string-set! modifies a character
such that its width increases,
the string is either split further,
or an entire section of the string is replaced
by a wider representation. The access
time becomes O(log |string|),
eventually but with very small constants.
This is a matter of personal preference,
but for Scheme I would rather
like to see more advanced implementations
of strings with a simple
interface than simple implementations
with a complicated interface.
The most simple interface I can think
of is "string = vector of char".
In fact, I would even prefer if the
default 'string' type in Scheme would
behave like a type of constants---exactly
like 'integer' which involves a
fair amount of magic behind the scene
but is conceptually simple and
often amazingly efficient.