[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Scheme [was Re: strings draft]




    > From: tb@xxxxxxxxxx (Thomas Bushnell, BSG)

    > Tom Lord <lord@xxxxxxx> writes:

    > > On the other hand, if [a], [b], and [c] are all portable, equivalent,
    > > standard Scheme programs -- then in Turkish implementations,
    > > CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically
    > > odd manner.  

    > Not true!  

    > You can make [a], [b], and [c] all do the Right Thing, and not even
    > *have* CHAR-UPCASE or CHAR-DOWNCASE at all!

    > What they require is string-ci=? to behave Properly, in the contexts
    > where the Scheme reader uses it.

CHAR-UPCASE and CHAR-DOWNCASE are mandatory and STRING-CI=? is defined
in terms of CHAR-CI=?

If [a], [b], and [c] are all portable, equivalent, standard Scheme
programs then this portable, standard program:


    (let loop ((c (read-char)))
      (if (not eof-object? c)
          (begin
            (display (char-downcase c))
            (loop (read-char)))))

must be able to read any one of them and write as output a scheme
program with identical meaning, at _least_ if the resulting program is
read by the same implementation running the conversion.

There are two choices.   Either that program is permitted to convert
[b] and [c] into something other than [a] (such as by including some
dotless i's in the output) or it must convert [b] and [c] to [a].

In the latter case, CHAR-DOWNCASE behaves in a linguistically odd for
Turkish speakers because it either converts #\I to #\i or #\I to #\I.

In the former case, the Turkish implementation must provide that:

	(char-ci=? dotless-i #\i)

which is again, linguistically odd.

    > The question the reader needs to ask is "are these sequences of
    > characters the same identifier".  

Yes, and in R5RS that means "Are the constiuent characters of the identifier
equal in a case independent sense?"   The rest follows from that.

You say R5RS should not define identifier equivalence that way:

    > > I'm not so sure that that's terrible (and my proposals
    > > for R6RS reflect that assessment): those procedures are doomed to
    > > behave in a linguistically odd manner for a substantial number of
    > > reasons, in many other contexts besides Turkish implementations.

    > So punt them.  CHAR-UPCASE and CHAR-DOWNCASE are entirely unnecessary,
    > and since they cannot be sensibly implemented, and are entirely
    > unneeded, drop them!

The character casemappings would still need to be defined to specify
Scheme.  Reifying that definition into Scheme in the form of those
procedures is only natural.



    > > Rather, I propose that the standard character procedures be explicitly
    > > related to both the syntax of portable standard Scheme and the syntax
    > > of particular implementations.  For example, R6RS should require that:

    > > 	(char-downcase #\I) => #\i

    > Why?  R6RS should not have char-downcase at all.

The standard would still need to specify CHAR-DOWNCASE.   It would
still need to be possible to write portable CHAR-DOWNCASE with
whatever machinery the standard did provide.   There is no good reason
not to stick to the simple route of simply directly reifying
CHAR-DOWNCASE into Scheme.   There is a very good reason to do so:  so
that portable programs can accurately manipulate non-portable source texts.

-t