[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RESET [was Re: Encodings]
I'm not explaining myself well.
Let me try to define a category by examples.
Let's say there are two scheme source files, each of which uses the "same"
identifier in the same global (module global) scope/context. We say that in
a RNRS Scheme the identifier names or denotes the same value.
Let's say the two files are stored in different encodings (say utf-8 and
ucs-2) and processed by different but conforming Unicode systems (text
editors, Scheme read/write, whatever) so that identifiers still appear the
same when displayed but are stored in different encodings.
A Scheme implementation which properly reads the two files should end up with
the identifier occurrences denoted above represented by symbols which are eq?
(NB: _not_ eqv?) to each other. If not, I term this "broken".
[In the absence of reflection] one should be able to consistently replace all
occurrences of an identifier in the same scope without changing the meaning/
behavior of a program. If not, I term the situation "broken".
There are many concepts which come in paired/binary parts: on/off, up/down, et
cetera, which have no meaning without both parts. Up without down does not
[Q: If you call a tail a leg, how many legs does a dog have?
A: Four. Calling a tail a leg does not make it one].
[Q: I have a pencil which is at 60 degrees C. What is the temperature of an
atom in the pencil?
A: The question is meaningless. Temperature is an aggregate property of
molecules, it is not applicable to a single atom].
So if a glyph/character does not have a case variant, considering it to be
lower case makes no logical sense. I view this as an abuse of terminology.
Being outside of normal logic, I term this "bizarre" and if pressed, probably
"broken" as well.
So in all this discussion of multiple canonical forms (another misuse of
terminology, IMHO) multiple normal forms, et cetera, I am looking for a
description of how to keep  and  from being broken.
If satisfying the Unicode Standard means breaking , then I say "Don't do
Scheme is a programming language, not a "natural" language. Define a single
acceptable canonical/normal form in which the "same" identifiers, represented
as symbols, are always eq? to each other. Or define what an acceptable
encoding is, only accept that form, and let external tool(s) do the
I have as yet not developed an operational model (a story) of how the above
works. This is the source of my confusion.
Can you help me out?