[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.



On Fri, Feb 13, 2004 at 07:51:49AM +0100, Ken Dickey wrote:
>>> What about creating a tool which reads bizarre Unicode and writes it
>>> out in a canonical format?  Then requiring portable Scheme programs to
>>> pass through it?

> Bradd W. Szonye wrote:
>> That wouldn't help unless they agree to write the *same* canonical
>> format. Besides, this is just separating part of the reader's job into
>> an external program, and in an error-prone way.

> I think there is again confusion between processing Unicode data and
> reading Scheme programs.
> 
> Let's say that there is a Scheme SRFI (or even, *GASP*, a standard)
> which picks a single cannonical Unicode form (say the most compact
> one) and requires, where Unicode is used, that Scheme programs be
> prepared in that format ....

Such a program would not conform to the Unicode standard:

    C9. A process shall not assume that the interpretations of two
        canonical-equivalent character sequences are distinct.

This section goes on to concede that

    Ideally, an implementation would always interpret two
    canonical-equivalent character sequences identically. There are
    practical circumstances under which implementations may reasonably
    distinguish them.

For example, a program may implement an earlier version of the standard,
and therefore not recognize that newer sequences are supposed to be
canonically equivalent. However, a program that implemented Unicode in
the way you suggest would be perversely ignorant, much like Bear's
example of a Scheme reader that only case-folded the letters from A..Z.

In other words, recognizing canonically-equivalent characters *is* the
responsibility of the reader, if it claims to implement the Unicode
character set. If you view the combined converter and reader as a single
"program," then you might technically conform to the standard, but it
would be a perverse conformance, and therefore undesirable.
-- 
Bradd W. Szonye
http://www.szonye.com/bradd