[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Octet vs Char (Re: strings draft)

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.



On Monday 26 January 2004 02:13 pm, Matthew Dempsky wrote:
> From my understanding there are a few issues being discussed here
> (with possibly overlapping consequences) -- O(1) performance of
> strings and using strings for octet sequences.
>
> SRFI-4 defines homogenous numeric vectors which seem like they would
> suffice for implementing octet streams independant of strings and, if
> not, a few Scheme implementations (larceny, MIT/GNU-scheme) provide
> byte-vectors.
>
> As for O(1) performance, there's undoubtably a need for
> non-O(1)-string-like-heavy-weight-objects (herein called ropes) which
> some implementations choose to implement and call them strings (as
> permitted by r5rs).  The current proposal seems to urge
> implementations to provide these as a seperate data type.

Well color me dumb, but I don't see why getting O(1) is such a big deal.

Let's say an implementation reads UTF-8 chars.  A uni-string could be 
represented as a tuple.  The required part is a byte-vector of characters 
that fit into 8 bits.  If characters have 16 bit representation, use a 
second, 16-bit vector and a map of which chars are the long ones.  If 16 bit, 
index into the 2nd array, else into the first.  The 2nd array could be 
shortened by using the byte-vector to hold indexes into the 16-bit array in 
the mapped/unused bytevector slots.  In the "worst case" (storage wise), we 
only need the 16-bit array.  The mapping could be a bit-vector or whatever 
makes sense in a particular implementation.

It is true that there are more code paths in this scheme (sic), but each kind 
of representation "knows that" and this still makes refs & sets ~ O(1).

What am I missing here?

Thanks,
-KenD