This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
On Wed, 13 Jul 2005, Per Bothner wrote: > Shiro Kawai wrote: >> I feel a bit uncomfortable, though, with the fact that indexes >> and string-length differ among different implementations, or >> even in the same implementations with different character >> encodings. > I can see an issue if you try to write that out using one > implementation, and read it back in with another. Not sure how > important that is. Actually, it's supposed to be a non-problem for unicode-compliant applications, because the unicode string equivalence algorithm is *required* to treat strings as equivalent regardless of how the graphemes within them are encoded. Speaking of which, the current draft of the SRFI is not unicode-compliant in that its string=? predicate does not detect strings which are "canonically equivalent" according to the Unicode Consortium's required string equivalence algorithm. They define strings as equal if they contain a sequence of graphemes which are equivalent, and you're defining strings as equal if they contain a sequence of codepoints which are equivalent. Aaaand, this is yet another problem that goes away if you embrace glyph=character instead of codepoint=character. With Unicode, you *CANNOT* make assumptions about how strings are represented. Two strings which are "equal" under unicode's required equivalence predicates may be of different lengths and have not a single codepoint in common, and the differences are purely representation artifacts. If you embrace glyph=character then at least a given string will portably be a fixed number of characters, and a unicode-aware char=? predicate can bury representation artifacts below the level of notice of the programmer or user. Bear