[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing Scheme [was Re: strings draft]
Tom Lord <lord@xxxxxxx> writes:
> On the other hand, if [a], [b], and [c] are all portable, equivalent,
> standard Scheme programs -- then in Turkish implementations,
> CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically
> odd manner.
Not true!
You can make [a], [b], and [c] all do the Right Thing, and not even
*have* CHAR-UPCASE or CHAR-DOWNCASE at all!
What they require is string-ci=? to behave Properly, in the contexts
where the Scheme reader uses it.
And that's no trouble; simply say that string-ci=? behaves Properly in
a certain specified locale, and that the Scheme reader uses it in that
locale.
The question the reader needs to ask is "are these sequences of
characters the same identifier". *One* way to implement that is by
canonicalizing all identifiers, and then matching the strings with
string-=. But that is *not* the only implementation, and all that is
actually needed is string-ci=?, and *not* any canonicalization
technique.
> I'm not so sure that that's terrible (and my proposals
> for R6RS reflect that assessment): those procedures are doomed to
> behave in a linguistically odd manner for a substantial number of
> reasons, in many other contexts besides Turkish implementations.
So punt them. CHAR-UPCASE and CHAR-DOWNCASE are entirely unnecessary,
and since they cannot be sensibly implemented, and are entirely
unneeded, drop them!
> Rather, I propose that the standard character procedures be explicitly
> related to both the syntax of portable standard Scheme and the syntax
> of particular implementations. For example, R6RS should require that:
>
> (char-downcase #\I) => #\i
Why? R6RS should not have char-downcase at all.
But it should certainly be true that
(string-ci=? "I" "i" (standard-scheme-locale)) => #t
This is all that is necessary.
Thomas