[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: constant-time access to variable-width encodings

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



bear wrote:
Aaaand, this is yet another problem that goes away if you embrace
glyph=character instead of codepoint=character.

Huh? A glyph depends on a specific font. No way can we define Scheme characters in terms of glyphs.

Do you mean a (canonicalized) composite (combining) sequence? One problem is you can't practially map one of those to a fixed-length integer value, so we have to give up char->integer and integer->char. Also, if equal characters are to be eq? they would have to interned, like strings. Both of these chanegs are possible, but a rather radical (and unneeded departure) from current practice.

With Unicode,
you *CANNOT* make assumptions about how strings are represented.
Two strings which are "equal" under unicode's required
equivalence predicates may be of different lengths and have not a
single codepoint in common, and the differences are purely
representation artifacts.

Nonetheless, Java defines the Strings equals routine in terms of code point equality, and Java programmers manage to get useful work done.
--
	--Per Bothner
per@xxxxxxxxxxx   http://per.bothner.com/