[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

collation algorithm




The proposed semantics for collation of strings
(using string>? & friends) by pointwise comparison
is in direct conflict with the unicode standard
for locale-independent collation of strings, as
expressed in

http://www.unicode.org/reports/tr10/

The unicode collation algorithm abstracts over
representation issues such as how characters are
rendered as sequences of individual codepoints,
making the test for canonical (glyph) equivalence
rather than codepoint equivalence.

It's hard to say what the right thing to do is.
I think that hardly any programming language
implementors will actually support the Unicode
standard on this case, and I'm not sure they should;
it devotes massive resources to compensating for the
technical flaws in the rest of the Unicode standard.
Certainly if you value performance (speed) of code
you'll want to provide string comparison operators
that don't go to this much trouble.

Since I figure most language implementors will ignore
it (and *are* ignoring it, in Java and C#) this part
of the Unicode standard will probably eventually be
abandoned.

At the same time, I want to leave it legal for
scheme implementors who are actually doing unicode
support to conform to it if they want to.

			Bear