[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: String comparison under Latin-1 and Unicode

This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.



> it seems to me that STRING<? and
> others are better left for trivial tasks like sorting strings of digits;

Very few such trivial tasks exist in the application domain.

> for example, if you want to implement string sets as
> sorted lists, it's much better to use fast ordering predicate,

I argue that this is the only time that a language-insensitive ordering 
predicate is useful: implementing ADTs on top of strings.  And for that, 
the character-ordering predicates remain.  (Indeed, the character-ordering
predicates, 
much like the character-wise casemapping procedures, seem useful for little
else.)

> especially because collation is actually a complex process
> involving generation of "collation keys" which can be reused:

Agreed.  But for single-pass cases where you don't want to cache and reuse
sort keys, I would like the string preds to do the right thing, which, I
claim,
is language-sensitive collation.

So it's back to the string procs vs text procs thread of a couple of months
ago.

> the other hand, some Schemes have already implemented
> extended versions of these predicates accepting more than
> two arguments to make them similar to < and others

I see.  Might foul the (APPLY STRING<? some-set-of-strings) case
if it had to check the type of its last argument.

I also want (WITH-LANGUAGE ...) to solve the Turkish casemapping issue.
Separate point.