[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Scheme [was Re: strings draft]

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

    > From: Ken Dickey <Ken.Dickey@xxxxxxxxxxxxxx>

    > Excuse me if the obvious has already been addressed, but..

    > It would be a *bad thing* if in going from one locale to another
    > changed a working Scheme program into a broken Scheme program.

    > So, please be sure that the specification of character and
    > string encoding and of portable Scheme source code defines
    > Scheme source as being locale indepent (by construction).

Do you agree that this is a portable, standard Scheme program?:

	(define i 42)			[a]
        (display i)

What about this next one?   As nearly as I can tell, the formal syntax
in chapter 7 says that this next program is _not_ portable, but the
language in chapters 2 and 6 suggests that that is an unintended
deficiency of chapter 7:

	(DEFINE I 42)			[b]
        (DISPLAY I)

and if that is legal, is this a portable, standard Scheme program with
equivalent behavior?

	(DEFINE I 42)			[c]
        (display i)

Strictly speaking, R5RS seems to say that [a] is portable, [b] is not,
and among implementations on which [b] and [c] both run, they are not
required to be identical in meaning.  The same strict reading implies
that the following is _not_ a portable Scheme program:


and that this is permitted:

	(string-ci=? "define" "DEFINE") => #f

I tend to think that R5RS is deficient (relative to the authors'
intentions) in that regard.  These restrictions would make it a real
mess (at best) to try to write a portable Scheme program that could
process Scheme source texts containing identifiers which use any
letters other than #\a..#\z.

For example, I would like this portable, standard program to produce
as output a one-line, portable, standard Scheme expression:

	(display (char-downcase (char-upcase #\i)))

however, the strictest reading of R5RS suggests that it is not
guaranteed to do so.

On the other hand, if [a], [b], and [c] are all portable, equivalent,
standard Scheme programs -- then in Turkish implementations,
CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically
odd manner.  I'm not so sure that that's terrible (and my proposals
for R6RS reflect that assessment): those procedures are doomed to
behave in a linguistically odd manner for a substantial number of
reasons, in many other contexts besides Turkish implementations.
While they _may_ behave in linguistically ideal ways in _some_
contexts -- that can not be what they are for.  (Even where they must
behave oddly, they can provide a good _approximation_ of something
linguistically useful.)

Rather, I propose that the standard character procedures be explicitly
related to both the syntax of portable standard Scheme and the syntax
of particular implementations.  For example, R6RS should require that:

	(char-downcase #\I) => #\i

and require that within a given implementation, if:

	(char-alphabetic c) => #t


	(display c) (newline)

produces as output a one line expression that consists of a valid
identifier in that implementation.