This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.
On Monday 26 January 2004 02:13 pm, Matthew Dempsky wrote: > From my understanding there are a few issues being discussed here > (with possibly overlapping consequences) -- O(1) performance of > strings and using strings for octet sequences. > > SRFI-4 defines homogenous numeric vectors which seem like they would > suffice for implementing octet streams independant of strings and, if > not, a few Scheme implementations (larceny, MIT/GNU-scheme) provide > byte-vectors. > > As for O(1) performance, there's undoubtably a need for > non-O(1)-string-like-heavy-weight-objects (herein called ropes) which > some implementations choose to implement and call them strings (as > permitted by r5rs). The current proposal seems to urge > implementations to provide these as a seperate data type. Well color me dumb, but I don't see why getting O(1) is such a big deal. Let's say an implementation reads UTF-8 chars. A uni-string could be represented as a tuple. The required part is a byte-vector of characters that fit into 8 bits. If characters have 16 bit representation, use a second, 16-bit vector and a map of which chars are the long ones. If 16 bit, index into the 2nd array, else into the first. The 2nd array could be shortened by using the byte-vector to hold indexes into the 16-bit array in the mapped/unused bytevector slots. In the "worst case" (storage wise), we only need the 16-bit array. The mapping could be a bit-vector or whatever makes sense in a particular implementation. It is true that there are more code paths in this scheme (sic), but each kind of representation "knows that" and this still makes refs & sets ~ O(1). What am I missing here? Thanks, -KenD