[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case mappings



Michael Sperber <sperber@xxxxxxxxxxxxxxxxxxxxxxxxxxx> writes:

> I have expressed myself poorly: If I want my code to behave in a
> locale-specific way, then I'll use procedures that respect locale.
> Ditto for full string-to-string case-mapping in a locale-independent
> manner.  In other words, I need to indicate in my source code what I
> want.  Sometimes I want exactly the case mapping the current SRFI
> draft has---I *don't* want any of the others.

My concern is with code that *doesn't* specify anything, because it
"just wants ordinary case mapping".  What I'm concerned with is that
we don't name a procedure "turn this to upper case", with no locale
arguments, that just does some half-assed inadequate job.  It is
better to provide no procedure than such a procedure.

> I don't know what you think will become trivial.  Not much I can think
> of.  Even implementing the UnicodeDate.txt case mappings is not
> trivial to do efficiently---which is why we'll provide a reference
> implementation for those pretty soon.

Designing the interfaces becomes trivial.  Implementing
UnicodeData.txt is not trivial to do efficiently, but that's no excuse
for failing to do it.  *Language writing systems are complex.*  Many
people, brought up on the Latin alphabet alone, got lazy and
accustomed to thinking that things like case mapping are trivial and
quick.  They aren't.  There are efficient methods (though I agree with
you that it's tricky to find them!).

My concern is that the *interfaces* not 

1) Prevent real conformance to Unicode.
2) Encourage Scheme programmers to do things which can't conform to
   Unicode even on a Scheme systems that want to conform.

The current Scheme spec violates (1).  

The current SRFI here violates (2).

My proposal: to specify the existence of a locale argument, together
with the expectation that the argument defaults either to the current
environmental locale (LANG on Posix systems), or to a default of
UnicodeData+SpecialCasing.

I'm happy to then let this argument be optional, provided that in the
absence of the argument it defaults to a global variable, which in
turn is either the environment or the UnicodeData+SpecialCasing
default.

But if you insist on saying that the case-mapping procedures operate
on characters instead of strings, then you will violate both (1) and
(2).

Thomas