[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: constant-time access to variable-width encodings

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.




There are lots of possibilities to implement strings of characters such
that the interface can pretend a string is just a vector of Unicode Scalar
Values, while representing most characters by 8 bits if possible.

One of the alternatives is representing strings in pieces that are
homogeneous with respect to whether they use 1, 2, or 3 bytes per char.
When string-set! modifies a character such that its width increases,
the string is either split further, or an entire section of the string is replaced
by a wider representation. The access time becomes O(log |string|),
eventually but with very small constants.

This is a matter of personal preference, but for Scheme I would rather
like to see more advanced implementations of strings with a simple
interface than simple implementations with a complicated interface.
The most simple interface I can think of is "string = vector of char".

In fact, I would even prefer if the default 'string' type in Scheme would
behave like a type of constants---exactly like 'integer' which involves a
fair amount of magic behind the scene but is conceptually simple and
often amazingly efficient.

Sebastian.