[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

Thomas Lord scripsit:

> There is no reason I can see why the implementation must provide
> the kind of labeling you like.   I have no trouble imagining
> many applications that don't need it --- they either won't
> need the system to enforce a restriction because they'll know
> what their ports are talking to or they are robust in the case
> that they get wrong what the ports expect.    The goal
> here isn't to try to prevent programmers from making mistakes.

Granted.  But if a {input,output} port doesn't know its encoding, it
doesn't know how to translate {characters,bytes} to {bytes,characters}
at all.  It's not a matter of overcoming restrictions -- it's fundamental.
An output port needs to know, e.g., whether to output #\u0131 (dotless i)
as an 0xB9 byte (ISO 8859-3) or an 0xFD (ISO 8859-9) or an 0xB8 (ANSEL)
or not at all.  (The same mutatis mutandis for an input port.)

> For example, a common suggestion is that you specify when
> first creating a port what encoding it is.  Well, what if
> I hope to send traffic over that port that will use different
> encodings at different points during the run?  Clearly labeling
> is not trivial -- I'd say, not well understood.

That can happen on occasion, but it's a highly specialized case that can
be layered directly over an octet port.  Specifying a fixed encoding
for a character port makes straightforward things easy.  This is far
from being rocket science.

John Cowan  jcowan@xxxxxxxxxxxxxxxxx  www.ccil.org/~cowan  www.reutershealth.com
If he has seen farther than others,
        it is because he is standing on a stack of dwarves.
                --Mike Champion, describing Tim Berners-Lee (adapted)