Re: Parsing Scheme [was Re: strings draft]

Tom Lord <lord@xxxxxxx> writes:

> On the other hand, if [a], [b], and [c] are all portable, equivalent,
> standard Scheme programs -- then in Turkish implementations,
> CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically
> odd manner.  

Not true!  

You can make [a], [b], and [c] all do the Right Thing, and not even

What they require is string-ci=? to behave Properly, in the contexts
where the Scheme reader uses it.

And that's no trouble; simply say that string-ci=? behaves Properly in
a certain specified locale, and that the Scheme reader uses it in that

The question the reader needs to ask is "are these sequences of
characters the same identifier".  *One* way to implement that is by
canonicalizing all identifiers, and then matching the strings with
string-=.  But that is *not* the only implementation, and all that is
actually needed is string-ci=?, and *not* any canonicalization

> I'm not so sure that that's terrible (and my proposals
> for R6RS reflect that assessment): those procedures are doomed to
> behave in a linguistically odd manner for a substantial number of
> reasons, in many other contexts besides Turkish implementations.

So punt them.  CHAR-UPCASE and CHAR-DOWNCASE are entirely unnecessary,
and since they cannot be sensibly implemented, and are entirely
unneeded, drop them!

> Rather, I propose that the standard character procedures be explicitly
> related to both the syntax of portable standard Scheme and the syntax
> of particular implementations.  For example, R6RS should require that:
> 	(char-downcase #\I) => #\i

Why?  R6RS should not have char-downcase at all.

But it should certainly be true that
  (string-ci=? "I" "i" (standard-scheme-locale)) => #t

This is all that is necessary.