[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: String comparison under Latin-1 and Unicode

This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.



> > it seems to me that STRING<? and
> > others are better left for trivial tasks like sorting strings of digits;
>
> Very few such trivial tasks exist in the application domain.
>
> > for example, if you want to implement string sets as
> > sorted lists, it's much better to use fast ordering predicate,
>
> I argue that this is the only time that a language-insensitive ordering
> predicate is useful: implementing ADTs on top of strings.  And for that,
> the character-ordering predicates remain.  (Indeed, the character-ordering
> predicates,
> much like the character-wise casemapping procedures, seem useful for
little
> else.)

I understand your position, but my point is a philosophical one: the
approach
to text processing in 21st century is very different from the one envisioned
by Scheme authors; now we know that text is a rather complex entity and
an array of bytes is not the best representation for it if you want to
implement
the text processes described in the UNICODE book effectively. Tweaking
old Scheme functions and data types cannot bring us significantly closer to
the ultimate solution because it's hard to modify the existing semantics
without breaking a lot of existing code. The old data types and functions
have their purpose: there's still a lot of language-insensitive processing
like manipulation with file names, compiler implementation, etc. for
which the simplicity and effectivenes of Scheme primitives is a bonus.
But if we want to design a real text processing library, I think we should
forget about strings and concentrate on a new data type: TEXT.

Sergei