[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



Thomas Lord scripsit:

> I think it might be realistic to label ports not with
> the encoding scheme they want, but with the set of 
> code-values they can transmit -- in other words
> with their framing constraints.   In other words -- 
> a "UTF-8 port" (no such thing, really) and an "ASCII port"
> (no such thing, again) are *really* just "8-bit ports".
> A "UTF-16 port" is *really* just a "16-bit port".

The difficulty here is that an ISO-8859-1 port {produces,accepts} a
different set of characters from an ISO-8859-2 port.  Unless a port is
labeled with an encoding, you can't know what characters it will and
won't {accept,produce}, and you are stuck with some system default.
Even a 16-bit port behaves differently depending on whether it is
a UTF-16 port, a UTF-16LE port, or a UTF-16BE port.

I'm not saying that any Scheme system has to accept every possible
encoding (though I do think at least ASCII, UTF-8, and UTF-16 should
be mandatory; they are all trivial), but it needs to be possible
to specify the encoding of a port when it is created.  (I don't think
it's necessary to be able to change it on the fly, though.)

> At the same time, several of us agree that WRITE-CHAR
> should accept a CHAR argument which is, in essence, a
> codepoint.

In which case it is the output port's encoding that says what octets
to write.

> I think an implementation should be permitted to have a
> version of WRITE-CHAR which is not total for all PORT,
> CHAR pairs:  try to write a wide character on an 8-bit
> port and that's an error, etc.

Absolutely.  Or more specifically: attempt to write a character that's
not in the repertoire associated with the encoding is an error.
Allowing this to be lax is just asking for trouble.

Given that, it's easy to create a higher-level abstraction that will
{write,read} impossible characters with some encoding scheme.

-- 
Some people open all the Windows;       John Cowan
wise wives welcome the spring           jcowan@xxxxxxxxxxxxxxxxx
by moving the Unix.                     http://www.reutershealth.com
  --ad for Unix Book Units (U.K.)       http://www.ccil.org/~cowan
        (see http://cm.bell-labs.com/cm/cs/who/dmr/unix3image.gif)