[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: String comparison under Latin-1 and Unicode

This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.

>>>>> On Fri, 10 Mar 2000 14:43:05 -0500, "Sergei Egorov" <esl@xxxxxxxxxxxxxxx> said:

> I don't agree with this proposal: it seems to me that STRING<? and
> others are better left for trivial tasks like sorting strings of
> digits; they have simple definition based on CHAR<? that, in its
> turn, is based on internal encoding (ASCII or UNICODE). It is still
> very useful as ordering predicate with no language-dependent
> meaning; for example, if you want to implement string sets as sorted
> lists, it's much better to use fast ordering predicate, even if the
> induced ordering doesn't make any sense. From the other hand, some

A reasonable argument.

> I would suggest using new names for collation predicates, especially
> because collation is actually a complex process involving generation
> of "collation keys" which can be reused:

> (string->collation-key str language-specifier) => c-key
> (collation-key<? c-key1 c-key2) => bool
> (collation-key<=? c-key1 c-key2) => bool
> ...  and then you can define your own collation predicates:

I would much prefer either:
	(collation->predicate language-specifier ordering) -> pred?
	(pred? string1 string2) -> bool

where LANGUAGE-SPECIFIER is as Ben Goetter <goetter@xxxxxxxxxxxxxxxx>
suggested and ORDERING is one of the strings "<", "<=", or "="

This seems far more useful, and efficient that converting any string
you want to compare to a collation-key!