[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



Thomas Lord scripsit:

> This is a fairly radical proposal.   It means, for example,
> the READ-CHAR will never know squat about UTF-8:  READ-CHAR
> is doomed, under my suggestions, to remain forever a low-level
> procedure.

"Radical" is not the word.  It means that a conformant Scheme will be
*compelled* to interpret text files as Latin-1, even on systems where
that is not the native encoding, unless the user or the system interposes
an interpretive layer that cleans up the characters.

Why privilege Latin-1 in such a fashion?  It's not even the native
encoding of the majority of systems out there.  It merely happens to
be the encoding that contains the bottom 256 Unicode codepoints.

> On the other hand, it's upward compatible and sets a stage
> for experimentation re I/O paradigms.  (Upward compat with
> the standard, not implementations -- the divergence being
> over how procedures are named, not what they do.)

It prescribes behavior that the standard did not; for example, it
compels a 0x80 byte to be interpreted as U+0080, though on most
Windows systems 0x80 encodes U+20A0, the Euro sign.

-- 
Ambassador Trentino: I've said enough. I'm a man of few words.
Rufus T. Firefly: I'm a man of one word: scram!
        --Duck Soup                     John Cowan <jcowan@xxxxxxxxxxxxxxxxx>