[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case mappings

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



On 7/14/05, Thomas Bushnell BSG <tb@xxxxxxxxxx> wrote:
> 
> So please, just deal with the reality.  There is no such thing as
> character-by-character case mapping.  Please do not say "everyone will
> want one even though it's buggy, so we'll require it."  Everyone will
> not want one.  If it's not standard, then programmers will use the
> string-by-string procedures, and be quite happy.

We're really not arguing here, we want exactly the same thing with
respect to Unicode case mappings.  I don't think character-level case
mappings should be provided at all.

However, if I'm to parse MIME and HTML and perhaps 90% of the
network protocols out there, I do need the simple, consistent case
mapping they use wrt ASCII characters.  This level of case mapping
is so prevalent in computing that R6RS would be foolish not to
provide it, no matter what we decide on regarding Unicode.  I just
want to make this clear to the authors, in case they decide to drop
Unicode-aware case mappings.

The difference then for Unicode case mapping is that it is used as
a linguistic utility.  This is only meaningful at the string-level.  Any
algorithm that uses Unicode case mappings at the character-level
either really wants to be using ASCII-level case mappings (as for
the above examples) or is a fundamentally broken algorithm that
can never correctly perform Unicode string-level case mappings.

One option is to provide only the string-level operations, and to
require them to work with ASCII.  These operations could optionally
provide the full Unicode mappings, special cases and all.

It would be nice to provide at least a place-holder for locales, but
this does open another can of worms.  What is a locale?  In the
implementation I provide for Chicken and Gauche it's just a string,
but some schemes might want locale objects.  Furthermore, there's
probably a (current-locale).  Given that, does

  (string-ci=? s1 s2)

mean the same thing as

  (string-ci=? s1 s2 (current-locale))

or the same as

  (string-ci=? s1 s2 (independent-locale))

-- 
Alex