This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Shiro Kawai wrote:
I feel a bit uncomfortable, though, with the fact that indexes and string-length differ among different implementations, or even in the same implementations with different character encodings.
I'm assuming a single character encoding per implementation: either UTF-8, UTF-16, or a plain array of 20-bit characters. Supporting general character encodings is problematic, since you cannot always tell if a byte is an initial or subsequent (partial) character.
In explaining/specifying my proposal it might be useful to add: (define (char-representation-size ch) ;; Implementations will do this more efficiently! (string-length (make-string 1 ch)))> It makes a datastructure that holds a string and its indexes non-portable, for example.
I can see an issue if you try to write that out using one implementation, and read it back in with another. Not sure how important that is.
I'd agree the proposal if it introduces a different means of indexing, other than character count used for string-ref. Call it 'offset' for now. string-offset-ref, substring-offset etc. would provide offset-based operation, while string-ref, substring etc. work on character-based op.
That might be reasonable. But ...
Though it may be too cumbersome for core language.
Well, the complication is that existing code will be less efficient, and people have a choice between using string-ref (portable to R5RS but potentially slow) and string-offset-ref (portable to R6RS only but fast).
An alternative idea is to have a cache that maps the most recent (char index, offset) mapping. One problem is that even an immutable string now requires a mutable cache, with possible synchronization issues.
And this is too much variable-length-character centric API, which fixed-length character implementation or other implementations (such as tree of segments) wouldn't care much.
Not sure your point. Certainly a more complex data structure is appropriate for (say) a text editor, especially once you support character "attributes".
-- --Per Bothner per@xxxxxxxxxxx http://per.bothner.com/