[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing Scheme [was Re: strings draft]
> From: tb@xxxxxxxxxx (Thomas Bushnell, BSG)
> Tom Lord <lord@xxxxxxx> writes:
> > On the other hand, if [a], [b], and [c] are all portable, equivalent,
> > standard Scheme programs -- then in Turkish implementations,
> > CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically
> > odd manner.
> Not true!
> You can make [a], [b], and [c] all do the Right Thing, and not even
> *have* CHAR-UPCASE or CHAR-DOWNCASE at all!
> What they require is string-ci=? to behave Properly, in the contexts
> where the Scheme reader uses it.
CHAR-UPCASE and CHAR-DOWNCASE are mandatory and STRING-CI=? is defined
in terms of CHAR-CI=?
If [a], [b], and [c] are all portable, equivalent, standard Scheme
programs then this portable, standard program:
(let loop ((c (read-char)))
(if (not eof-object? c)
(display (char-downcase c))
must be able to read any one of them and write as output a scheme
program with identical meaning, at _least_ if the resulting program is
read by the same implementation running the conversion.
There are two choices. Either that program is permitted to convert
[b] and [c] into something other than [a] (such as by including some
dotless i's in the output) or it must convert [b] and [c] to [a].
In the latter case, CHAR-DOWNCASE behaves in a linguistically odd for
Turkish speakers because it either converts #\I to #\i or #\I to #\I.
In the former case, the Turkish implementation must provide that:
(char-ci=? dotless-i #\i)
which is again, linguistically odd.
> The question the reader needs to ask is "are these sequences of
> characters the same identifier".
Yes, and in R5RS that means "Are the constiuent characters of the identifier
equal in a case independent sense?" The rest follows from that.
You say R5RS should not define identifier equivalence that way:
> > I'm not so sure that that's terrible (and my proposals
> > for R6RS reflect that assessment): those procedures are doomed to
> > behave in a linguistically odd manner for a substantial number of
> > reasons, in many other contexts besides Turkish implementations.
> So punt them. CHAR-UPCASE and CHAR-DOWNCASE are entirely unnecessary,
> and since they cannot be sensibly implemented, and are entirely
> unneeded, drop them!
The character casemappings would still need to be defined to specify
Scheme. Reifying that definition into Scheme in the form of those
procedures is only natural.
> > Rather, I propose that the standard character procedures be explicitly
> > related to both the syntax of portable standard Scheme and the syntax
> > of particular implementations. For example, R6RS should require that:
> > (char-downcase #\I) => #\i
> Why? R6RS should not have char-downcase at all.
The standard would still need to specify CHAR-DOWNCASE. It would
still need to be possible to write portable CHAR-DOWNCASE with
whatever machinery the standard did provide. There is no good reason
not to stick to the simple route of simply directly reifying
CHAR-DOWNCASE into Scheme. There is a very good reason to do so: so
that portable programs can accurately manipulate non-portable source texts.