[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RESET [was Re: Encodings]

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.

I'm not explaining myself well.  

Let me try to define a category by examples. 


Let's say there are two scheme source files, each of which uses the "same" 
identifier in the same global (module global) scope/context.  We say that in 
a RNRS Scheme the identifier names or denotes the same value.

Let's say the two files are stored in different encodings (say utf-8 and 
ucs-2) and processed by different but conforming Unicode systems (text 
editors, Scheme read/write, whatever) so that identifiers still appear the 
same when displayed but are stored in different encodings.

A Scheme implementation which properly reads the two files should end up with 
the identifier occurrences denoted above represented by symbols which are eq?  
(NB: _not_ eqv?) to each other.  If not, I term this "broken".


[In the absence of reflection] one should be able to consistently replace all 
occurrences of an identifier in the same scope without changing the meaning/
behavior of a program.  If not, I term the situation "broken".


There are many concepts which come in paired/binary parts: on/off, up/down, et 
cetera, which have no meaning without both parts.  Up without down does not 
make sense.  

[Q: If you call a tail a leg, how many legs does a dog have?
 A: Four. Calling a tail a leg does not make it one].

[Q: I have a pencil which is at 60 degrees C.  What is the temperature of an 
atom in the pencil?
 A: The question is meaningless.  Temperature is an aggregate property of 
molecules, it is not applicable to a single atom].

So if a glyph/character does not have a case variant, considering it to be 
lower case makes no logical sense.  I view this as an abuse of terminology.  
Being outside of normal logic, I term this "bizarre" and if pressed, probably 
"broken" as well.

So in all this discussion of multiple canonical forms (another misuse of 
terminology, IMHO) multiple normal forms, et cetera, I am looking for a 
description of how to keep [1] and [2] from being broken.

If satisfying the Unicode Standard means breaking [1], then I say "Don't do 

Scheme is a programming language, not a "natural" language.  Define a single 
acceptable canonical/normal form in which the "same" identifiers, represented 
as symbols, are always eq? to each other.  Or define what an acceptable 
encoding is, only accept that form, and let external tool(s) do the 

I have as yet not developed an operational model (a story) of how the above 
works.  This is the source of my confusion.

Can you help me out?