[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

Thomas Lord scripsit:

> There is no reason I can see why the implementation must provide
> the kind of labeling you like.   I have no trouble imagining
> many applications that don't need it --- they either won't
> need the system to enforce a restriction because they'll know
> what their ports are talking to or they are robust in the case
> that they get wrong what the ports expect.    The goal
> here isn't to try to prevent programmers from making mistakes.

Granted.  But if a {input,output} port doesn't know its encoding, it
doesn't know how to translate {characters,bytes} to {bytes,characters}
at all.  It's not a matter of overcoming restrictions -- it's fundamental.
An output port needs to know, e.g., whether to output #\u0131 (dotless i)
as an 0xB9 byte (ISO 8859-3) or an 0xFD (ISO 8859-9) or an 0xB8 (ANSEL)
or not at all.  (The same mutatis mutandis for an input port.)

> For example, a common suggestion is that you specify when
> first creating a port what encoding it is.  Well, what if
> I hope to send traffic over that port that will use different
> encodings at different points during the run?  Clearly labeling
> is not trivial -- I'd say, not well understood.

That can happen on occasion, but it's a highly specialized case that can
be layered directly over an octet port.  Specifying a fixed encoding
for a character port makes straightforward things easy.  This is far
from being rocket science.

John Cowan  jcowan@xxxxxxxxxxxxxxxxx  www.ccil.org/~cowan  www.reutershealth.com
If he has seen farther than others,
        it is because he is standing on a stack of dwarves.
                --Mike Champion, describing Tim Berners-Lee (adapted)