[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are byte ports "ports" as such?

This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.




I do think that the semantics of having byte and character
operations on the same port are a mess, and that the standard
ought not require it.  I support the notion that byte ports
and character ports ought to be separate types, opened by
separate routines, and that it ought to be an error (or at
least, invoke implementation-defined behavior) to attempt to
read a char from a byte port or vice versa.

When you read a byte, or a sequence of bytes, from a character
port, what state do you leave that port in?  Likewise, when you
read a character, or a sequence of characters, from a byte port,
how many bytes are gone?  The answer, in a non-ascii universe,
is that you just don't know.

If you use a character encoding that has multibyte sequences
for some unicode codepoints, you can be left with up to seven
bytes that are the "trailing part" of a codepoint before the
next codepoint begins.  And given combining codepoints and
variation selectors, the next codepoint may not begin a new
character itself.  It is just nuts to read arbitrary bytes off
a character port and then try to read a character out of the
random mess left behind.

Conversely, the premier use for reading a specific number of
bytes is implementation of comm protocols and other applications
where you have fields of particular byte lengths coming from
a serial connection or storage.  When you read a character off
one of these, your byte counts go kablooey and you don't know
where the next field starts.  Uniform-size records and length-
delimited fields, as used in databases and C FFI's, simply cannot
be handled by a port that you read a "character" off of.

			Bear