This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
(Aside: sorry to keep breaking threading. I've recently changed MUAs and for some reason the new set up is mucking up threaded replies. Hopefully it'll be fixed soon.) me: >> Therefore, if the character and string functions are "crude" >> with respect to natural language, then an implementation >> *can not* (cleanly, simply) allow identifier names which are >> globally-natural-language-friendly except in a crude way. John: > Can you give an example? I don't understand how this principle > applies. S-75 provides case-{in,}sensitive {character,string} > {identity,collation} functions, and provides syntax for the full scope > of Unicode scalar values as characters and USV sequences as strings. > Furthermore, every character string can be mapped to a symbol and vice > versa (excluding uninterned symbols, which are not part of the > standard). What is more, identifiers are explicitly made case- > sensitive, so the definition of the string-ci family > no longer affects them. Bob Unihacker wants to implement a Scheme in which two identifiers in the source text are identical if they are codepoint equal under some chosen form of canonicalization. He argues that his particular canonicalization rule is the best. He can not identify those two spellings of the identifier, though. If he identified them, in his implementation, under S-75, two different strings would have to convert to the same symbol. S-75 doesn't permit that. Hence my proposed solution which can be summarized: Allow, heck, even require chars and strings to be Unicode-friendly. Break the 1:1 string:symbol mapping and do not standardize symbol names or identifier names other than those containing only 8-bit codepoints. That leaves Bob Unihacker with a very un-lisplike problem to solve: his nice fully unicode source texts are no longer syntactically valid S-expressions. However, unlike in the current S-57, under my proposal Bob's solution is solvable -- multiple ways. Thus, Bob and all of his competitors can each propose new higher-level char-like and string-like types, and/or new conversion rules between strings and symbols, etc. In other words, my proposal enables the kind of experimentation the editors would like to see and, indeed, requires that experimentation of Bob and friends. > I don't see how we are forcing identifiers to be crude. We are > permitting distinct identifiers that look exactly alike, yes. > However, if we allow identifiers other than in Latin script at all, > then such spoofs are always possible; to take only the simplest > example, Latin A, Greek Alpha, and Cyrillic A look exactly alike. I dispute that any standard requires those characters to be presented indistinguishably. Moreover, different choices of identifier name canonicalization permit or deny different subsets of the possible spoofs. It is up to future experimenters to find the sweet spot for Scheme (though, certainly, consortium recommendations on the issue ought not be ignored). So, killing all possible spoofing isn't necessarily Bob's goal. Picking and choosing among which kinds of spoofing is a good goal and, imo, an open question. >> b) An analogous argument applies to the streams emitted and consumed >> by READ and WRITE. (This isn't *really* a separate point from >> (a) but people commonly treat it that way.) > I don't understand this argument either, alas. As I say above: code and data aren't really different in Scheme although arguably that hasn't always been the case. READ and WRITE are generic ways to externalize and import data; they ought to agree with the reader that loads source text into an interpreter or compiler. > Please spell out these implications (preferably with examples), as I > remain entirely in the dark. Does the above help? Perhaps I'm in the dark (in the sense of missing some point). -t