This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.
On Fri, Feb 13, 2004 at 07:51:49AM +0100, Ken Dickey wrote: >>> What about creating a tool which reads bizarre Unicode and writes it >>> out in a canonical format? Then requiring portable Scheme programs to >>> pass through it? > Bradd W. Szonye wrote: >> That wouldn't help unless they agree to write the *same* canonical >> format. Besides, this is just separating part of the reader's job into >> an external program, and in an error-prone way. > I think there is again confusion between processing Unicode data and > reading Scheme programs. > > Let's say that there is a Scheme SRFI (or even, *GASP*, a standard) > which picks a single cannonical Unicode form (say the most compact > one) and requires, where Unicode is used, that Scheme programs be > prepared in that format .... Such a program would not conform to the Unicode standard: C9. A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. This section goes on to concede that Ideally, an implementation would always interpret two canonical-equivalent character sequences identically. There are practical circumstances under which implementations may reasonably distinguish them. For example, a program may implement an earlier version of the standard, and therefore not recognize that newer sequences are supposed to be canonically equivalent. However, a program that implemented Unicode in the way you suggest would be perversely ignorant, much like Bear's example of a Scheme reader that only case-folded the letters from A..Z. In other words, recognizing canonically-equivalent characters *is* the responsibility of the reader, if it claims to implement the Unicode character set. If you view the combined converter and reader as a single "program," then you might technically conform to the standard, but it would be a perverse conformance, and therefore undesirable. -- Bradd W. Szonye http://www.szonye.com/bradd