This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 are here. Eventually, the entire history will be moved there, including any new messages.
> From: tb@xxxxxxxxxx (Thomas Bushnell, BSG) > Tom Lord <lord@xxxxxxx> writes: > > On the other hand, if [a], [b], and [c] are all portable, equivalent, > > standard Scheme programs -- then in Turkish implementations, > > CHAR-UPCASE, CHAR-DOWNCASE and friends must behave in a linguistically > > odd manner. > Not true! > You can make [a], [b], and [c] all do the Right Thing, and not even > *have* CHAR-UPCASE or CHAR-DOWNCASE at all! > What they require is string-ci=? to behave Properly, in the contexts > where the Scheme reader uses it. CHAR-UPCASE and CHAR-DOWNCASE are mandatory and STRING-CI=? is defined in terms of CHAR-CI=? If [a], [b], and [c] are all portable, equivalent, standard Scheme programs then this portable, standard program: (let loop ((c (read-char))) (if (not eof-object? c) (begin (display (char-downcase c)) (loop (read-char))))) must be able to read any one of them and write as output a scheme program with identical meaning, at _least_ if the resulting program is read by the same implementation running the conversion. There are two choices. Either that program is permitted to convert [b] and [c] into something other than [a] (such as by including some dotless i's in the output) or it must convert [b] and [c] to [a]. In the latter case, CHAR-DOWNCASE behaves in a linguistically odd for Turkish speakers because it either converts #\I to #\i or #\I to #\I. In the former case, the Turkish implementation must provide that: (char-ci=? dotless-i #\i) which is again, linguistically odd. > The question the reader needs to ask is "are these sequences of > characters the same identifier". Yes, and in R5RS that means "Are the constiuent characters of the identifier equal in a case independent sense?" The rest follows from that. You say R5RS should not define identifier equivalence that way: > > I'm not so sure that that's terrible (and my proposals > > for R6RS reflect that assessment): those procedures are doomed to > > behave in a linguistically odd manner for a substantial number of > > reasons, in many other contexts besides Turkish implementations. > So punt them. CHAR-UPCASE and CHAR-DOWNCASE are entirely unnecessary, > and since they cannot be sensibly implemented, and are entirely > unneeded, drop them! The character casemappings would still need to be defined to specify Scheme. Reifying that definition into Scheme in the form of those procedures is only natural. > > Rather, I propose that the standard character procedures be explicitly > > related to both the syntax of portable standard Scheme and the syntax > > of particular implementations. For example, R6RS should require that: > > (char-downcase #\I) => #\i > Why? R6RS should not have char-downcase at all. The standard would still need to specify CHAR-DOWNCASE. It would still need to be possible to write portable CHAR-DOWNCASE with whatever machinery the standard did provide. There is no good reason not to stick to the simple route of simply directly reifying CHAR-DOWNCASE into Scheme. There is a very good reason to do so: so that portable programs can accurately manipulate non-portable source texts. -t