[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Scheme [was Re: strings draft]




    > From: Ken Dickey <Ken.Dickey@xxxxxxxxxxxxxx>

    > Excuse me if the obvious has already been addressed, but..

    > It would be a *bad thing* if in going from one locale to another
    > changed a working Scheme program into a broken Scheme program.

    > So, please be sure that the specification of character and
    > string encoding and of portable Scheme source code defines
    > Scheme source as being locale indepent (by construction).


Do you agree that this is a portable, standard Scheme program?:

	(define i 42)			[a]
        (display i)
        (newline)

What about this next one?   As nearly as I can tell, the formal syntax
in chapter 7 says that this next program is _not_ portable, but the
language in chapters 2 and 6 suggests that that is an unintended
deficiency of chapter 7:

	(DEFINE I 42)			[b]
        (DISPLAY I)
        (NEWLINE)

and if that is legal, is this a portable, standard Scheme program with
equivalent behavior?


	(DEFINE I 42)			[c]
        (display i)
        (newline)


Strictly speaking, R5RS seems to say that [a] is portable, [b] is not,
and among implementations on which [b] and [c] both run, they are not
required to be identical in meaning.  The same strict reading implies
that the following is _not_ a portable Scheme program:

	"H2O"


and that this is permitted:

	(string-ci=? "define" "DEFINE") => #f

I tend to think that R5RS is deficient (relative to the authors'
intentions) in that regard.  These restrictions would make it a real
mess (at best) to try to write a portable Scheme program that could
process Scheme source texts containing identifiers which use any
letters other than #\a..#\z.

For example, I would like this portable, standard program to produce
as output a one-line, portable, standard Scheme expression:

	(display (char-downcase (char-upcase #\i)))
        (newline)

however, the strictest reading of R5RS suggests that it is not
guaranteed to do so.

On the other hand, if [a], [b], and [c] are all portable, equivalent,
standard Scheme programs -- then in Turkish implementations,
CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically
odd manner.  I'm not so sure that that's terrible (and my proposals
for R6RS reflect that assessment): those procedures are doomed to
behave in a linguistically odd manner for a substantial number of
reasons, in many other contexts besides Turkish implementations.
While they _may_ behave in linguistically ideal ways in _some_
contexts -- that can not be what they are for.  (Even where they must
behave oddly, they can provide a good _approximation_ of something
linguistically useful.)

Rather, I propose that the standard character procedures be explicitly
related to both the syntax of portable standard Scheme and the syntax
of particular implementations.  For example, R6RS should require that:

	(char-downcase #\I) => #\i

and require that within a given implementation, if:

	(char-alphabetic c) => #t

then

	(display c) (newline)

produces as output a one line expression that consists of a valid
identifier in that implementation.


-t