This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.
I'm not explaining myself well. Let me try to define a category by examples. [1] Let's say there are two scheme source files, each of which uses the "same" identifier in the same global (module global) scope/context. We say that in a RNRS Scheme the identifier names or denotes the same value. Let's say the two files are stored in different encodings (say utf-8 and ucs-2) and processed by different but conforming Unicode systems (text editors, Scheme read/write, whatever) so that identifiers still appear the same when displayed but are stored in different encodings. A Scheme implementation which properly reads the two files should end up with the identifier occurrences denoted above represented by symbols which are eq? (NB: _not_ eqv?) to each other. If not, I term this "broken". [2] [In the absence of reflection] one should be able to consistently replace all occurrences of an identifier in the same scope without changing the meaning/ behavior of a program. If not, I term the situation "broken". [3] There are many concepts which come in paired/binary parts: on/off, up/down, et cetera, which have no meaning without both parts. Up without down does not make sense. [Q: If you call a tail a leg, how many legs does a dog have? A: Four. Calling a tail a leg does not make it one]. [Q: I have a pencil which is at 60 degrees C. What is the temperature of an atom in the pencil? A: The question is meaningless. Temperature is an aggregate property of molecules, it is not applicable to a single atom]. So if a glyph/character does not have a case variant, considering it to be lower case makes no logical sense. I view this as an abuse of terminology. Being outside of normal logic, I term this "bizarre" and if pressed, probably "broken" as well. So in all this discussion of multiple canonical forms (another misuse of terminology, IMHO) multiple normal forms, et cetera, I am looking for a description of how to keep [1] and [2] from being broken. If satisfying the Unicode Standard means breaking [1], then I say "Don't do that!". Scheme is a programming language, not a "natural" language. Define a single acceptable canonical/normal form in which the "same" identifiers, represented as symbols, are always eq? to each other. Or define what an acceptable encoding is, only accept that form, and let external tool(s) do the processing. I have as yet not developed an operational model (a story) of how the above works. This is the source of my confusion. Can you help me out? Thanks, -KenD