[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Scheme [was Re: strings draft]

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.



Tom Lord <lord@xxxxxxx> writes:

> On the other hand, if [a], [b], and [c] are all portable, equivalent,
> standard Scheme programs -- then in Turkish implementations,
> CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically
> odd manner.  

Not true!  

You can make [a], [b], and [c] all do the Right Thing, and not even
*have* CHAR-UPCASE or CHAR-DOWNCASE at all!

What they require is string-ci=? to behave Properly, in the contexts
where the Scheme reader uses it.

And that's no trouble; simply say that string-ci=? behaves Properly in
a certain specified locale, and that the Scheme reader uses it in that
locale.  

The question the reader needs to ask is "are these sequences of
characters the same identifier".  *One* way to implement that is by
canonicalizing all identifiers, and then matching the strings with
string-=.  But that is *not* the only implementation, and all that is
actually needed is string-ci=?, and *not* any canonicalization
technique.  

> I'm not so sure that that's terrible (and my proposals
> for R6RS reflect that assessment): those procedures are doomed to
> behave in a linguistically odd manner for a substantial number of
> reasons, in many other contexts besides Turkish implementations.

So punt them.  CHAR-UPCASE and CHAR-DOWNCASE are entirely unnecessary,
and since they cannot be sensibly implemented, and are entirely
unneeded, drop them!

> Rather, I propose that the standard character procedures be explicitly
> related to both the syntax of portable standard Scheme and the syntax
> of particular implementations.  For example, R6RS should require that:
> 
> 	(char-downcase #\I) => #\i

Why?  R6RS should not have char-downcase at all.

But it should certainly be true that
  (string-ci=? "I" "i" (standard-scheme-locale)) => #t

This is all that is necessary.

Thomas