[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

Thomas Lord scripsit:

> This is a fairly radical proposal.   It means, for example,
> the READ-CHAR will never know squat about UTF-8:  READ-CHAR
> is doomed, under my suggestions, to remain forever a low-level
> procedure.

"Radical" is not the word.  It means that a conformant Scheme will be
*compelled* to interpret text files as Latin-1, even on systems where
that is not the native encoding, unless the user or the system interposes
an interpretive layer that cleans up the characters.

Why privilege Latin-1 in such a fashion?  It's not even the native
encoding of the majority of systems out there.  It merely happens to
be the encoding that contains the bottom 256 Unicode codepoints.

> On the other hand, it's upward compatible and sets a stage
> for experimentation re I/O paradigms.  (Upward compat with
> the standard, not implementations -- the divergence being
> over how procedures are named, not what they do.)

It prescribes behavior that the standard did not; for example, it
compels a 0x80 byte to be interpreted as U+0080, though on most
Windows systems 0x80 encodes U+20A0, the Euro sign.

Ambassador Trentino: I've said enough. I'm a man of few words.
Rufus T. Firefly: I'm a man of one word: scram!
        --Duck Soup                     John Cowan <jcowan@xxxxxxxxxxxxxxxxx>