[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing Scheme [was Re: strings draft]
> From: Ken Dickey <Ken.Dickey@xxxxxxxxxxxxxx>
> Excuse me if the obvious has already been addressed, but..
> It would be a *bad thing* if in going from one locale to another
> changed a working Scheme program into a broken Scheme program.
> So, please be sure that the specification of character and
> string encoding and of portable Scheme source code defines
> Scheme source as being locale indepent (by construction).
Do you agree that this is a portable, standard Scheme program?:
(define i 42) [a]
(display i)
(newline)
What about this next one? As nearly as I can tell, the formal syntax
in chapter 7 says that this next program is _not_ portable, but the
language in chapters 2 and 6 suggests that that is an unintended
deficiency of chapter 7:
(DEFINE I 42) [b]
(DISPLAY I)
(NEWLINE)
and if that is legal, is this a portable, standard Scheme program with
equivalent behavior?
(DEFINE I 42) [c]
(display i)
(newline)
Strictly speaking, R5RS seems to say that [a] is portable, [b] is not,
and among implementations on which [b] and [c] both run, they are not
required to be identical in meaning. The same strict reading implies
that the following is _not_ a portable Scheme program:
"H2O"
and that this is permitted:
(string-ci=? "define" "DEFINE") => #f
I tend to think that R5RS is deficient (relative to the authors'
intentions) in that regard. These restrictions would make it a real
mess (at best) to try to write a portable Scheme program that could
process Scheme source texts containing identifiers which use any
letters other than #\a..#\z.
For example, I would like this portable, standard program to produce
as output a one-line, portable, standard Scheme expression:
(display (char-downcase (char-upcase #\i)))
(newline)
however, the strictest reading of R5RS suggests that it is not
guaranteed to do so.
On the other hand, if [a], [b], and [c] are all portable, equivalent,
standard Scheme programs -- then in Turkish implementations,
CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically
odd manner. I'm not so sure that that's terrible (and my proposals
for R6RS reflect that assessment): those procedures are doomed to
behave in a linguistically odd manner for a substantial number of
reasons, in many other contexts besides Turkish implementations.
While they _may_ behave in linguistically ideal ways in _some_
contexts -- that can not be what they are for. (Even where they must
behave oddly, they can provide a good _approximation_ of something
linguistically useful.)
Rather, I propose that the standard character procedures be explicitly
related to both the syntax of portable standard Scheme and the syntax
of particular implementations. For example, R6RS should require that:
(char-downcase #\I) => #\i
and require that within a given implementation, if:
(char-alphabetic c) => #t
then
(display c) (newline)
produces as output a one line expression that consists of a valid
identifier in that implementation.
-t