[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Scheme [was Re: strings draft]



Tom Lord <lord@xxxxxxx> writes:

> CHAR-UPCASE and CHAR-DOWNCASE are mandatory and STRING-CI=? is defined
> in terms of CHAR-CI=?

If you're asking what should be in the next RnRS, then there is no
sense in which CHAR-UPCASE is mandatory.  The editors can choose to
include it, or not.  I am speaking of what I would like the next RnRS
to say, precisely because the current version is entirely unsuitable
for correct character handling.

There *is no* good implementation of R5RS if you want the Scheme
character type to be based upon Unicode.

> In the latter case, CHAR-DOWNCASE behaves in a linguistically odd for
> Turkish speakers because it either converts #\I to #\i or #\I to #\I.

This is not "linguistically odd", it's incorrect.  It is in fact
incorrect in a way which violates the best Unicode practices.  It is
this which I spoke of a while back when I first entered the thread.
If you are saying that it doesn't matter that the R5RS character type
cannot be used with the best Unicode practices, then I disagree
strongly.  

> The character casemappings would still need to be defined to specify
> Scheme.  Reifying that definition into Scheme in the form of those
> procedures is only natural.

Huh?  Why on earth would it?  We could specify scheme and give *no*
case-mapping functions, and instead only specify the output identifier
matching function.  I am coming to believe that it should not be
specified as string-ci=?, in fact, because a-with-accent-grave is not
ci=? to a-without-accent, but a system might sensibly choose to
treat them as equivalent for identifiers.

There should be string-id=? (or some other name) which implements the
Scheme identifier matching rules, which should be specified for the
required character set, and left unspecified for all other
characters.  

None of this requires or even implicitly uses a case mapping function.

> The standard would still need to specify CHAR-DOWNCASE.   

Why?  Is there some government bureau that will shut us down if the
next RnRS eleminates it?

I don't mind STRING-DOWNCASE, of course, which should have a locale
argument and be specified to permit the Correct Unicode Thing.

Thomas