[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: constant-time access to variable-width encodings
Aaaand, this is yet another problem that goes away if you embrace
glyph=character instead of codepoint=character.
Huh? A glyph depends on a specific font. No way can we define Scheme
characters in terms of glyphs.
Do you mean a (canonicalized) composite (combining) sequence? One
problem is you can't practially map one of those to a fixed-length
integer value, so we have to give up char->integer and integer->char.
Also, if equal characters are to be eq? they would have to interned,
like strings. Both of these chanegs are possible, but a rather radical
(and unneeded departure) from current practice.
you *CANNOT* make assumptions about how strings are represented.
Two strings which are "equal" under unicode's required
equivalence predicates may be of different lengths and have not a
single codepoint in common, and the differences are purely
Nonetheless, Java defines the Strings equals routine in terms of code
point equality, and Java programmers manage to get useful work done.