[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: constant-time access to variable-width encodings



From: Per Bothner <per@xxxxxxxxxxx>
Subject: constant-time access to variable-width encodings
Date: Wed, 13 Jul 2005 11:12:57 -0700

> The proposal is to allow string-ref to return #\partial for some indexes 
> representing non-initial bytes or low-surrogate values.

Interesting proposal, and I agree with the need of length-changing
mutation (see my other post).

I feel a bit uncomfortable, though, with the fact that indexes and
string-length differ among different implementations, or even in the
same implementations with different character encodings.  It makes
a datastructure that holds a string and its indexes non-portable,
for example.

I'd agree the proposal if it introduces a different means of
indexing, other than character count used for string-ref.  Call it
'offset' for now.  string-offset-ref, substring-offset etc. would
provide offset-based operation, while string-ref, substring etc.
work on character-based op.  Though it may be too cumbersome for
core language.  And this is too much variable-length-character centric
API, which fixed-length character implementation or other
implementations (such as tree of segments) wouldn't care much.

--shiro