[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: constant-time access to variable-width encodings

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

From: Per Bothner <per@xxxxxxxxxxx>
Subject: constant-time access to variable-width encodings
Date: Wed, 13 Jul 2005 11:12:57 -0700

> The proposal is to allow string-ref to return #\partial for some indexes 
> representing non-initial bytes or low-surrogate values.

Interesting proposal, and I agree with the need of length-changing
mutation (see my other post).

I feel a bit uncomfortable, though, with the fact that indexes and
string-length differ among different implementations, or even in the
same implementations with different character encodings.  It makes
a datastructure that holds a string and its indexes non-portable,
for example.

I'd agree the proposal if it introduces a different means of
indexing, other than character count used for string-ref.  Call it
'offset' for now.  string-offset-ref, substring-offset etc. would
provide offset-based operation, while string-ref, substring etc.
work on character-based op.  Though it may be too cumbersome for
core language.  And this is too much variable-length-character centric
API, which fixed-length character implementation or other
implementations (such as tree of segments) wouldn't care much.