[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: constant-time access to variable-width encodings
This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 are here. Eventually, the entire history will be moved there, including any new messages.
There are lots of possibilities to implement
strings of characters such
that the interface can pretend a string
is just a vector of Unicode Scalar
Values, while representing most characters
by 8 bits if possible.
One of the alternatives is representing
strings in pieces that are
homogeneous with respect to whether
they use 1, 2, or 3 bytes per char.
When string-set! modifies a character
such that its width increases,
the string is either split further,
or an entire section of the string is replaced
by a wider representation. The access
time becomes O(log |string|),
eventually but with very small constants.
This is a matter of personal preference,
but for Scheme I would rather
like to see more advanced implementations
of strings with a simple
interface than simple implementations
with a complicated interface.
The most simple interface I can think
of is "string = vector of char".
In fact, I would even prefer if the
default 'string' type in Scheme would
behave like a type of constants---exactly
like 'integer' which involves a
fair amount of magic behind the scene
but is conceptually simple and
often amazingly efficient.