[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Why are byte ports "ports" as such?
On 13-Apr-06, at 1:54 PM, Ben Goetter wrote:
If you separate byte ports from character ports, and separate input
ports from output ports (at least at the API level), you get an
easily type-checked interface. e.g.
open-input-file string [encoding keywords] -> input-character-port
read-char input-char-port -> character
open-input-file-raw string -> input-byte-port
read-byte input-byte-port -> integer
Did you read this section of the SRFI?
Byte ports support character I/O operations because with each byte
port is attached a character encoding specifying how characters are
encoded with bytes. It is incorrect to believe however that all ports
are byte ports. For example the ``string ports'' of SRFI 6 (Basic
String Ports) have no reason to be aware of the character to byte
encoding because they only deal with sequences of characters. So they
need not be byte ports. For this reason this SRFI views byte ports as
a subtype of character ports. Character ports support character I/O
operations and byte ports support character I/O operations and byte I/
O operations. All I/O operations which are valid on a character port
are also valid on a byte port. [Although not specified in this SRFI a
further generalization is ``object ports'' which are ports whose
fundamental I/O unit is the Scheme object. Character ports are object
ports because there is a standard encoding of (most) Scheme objects
SRFI 91 allows character I/O and binary I/O on byte ports because
often files use a format which mixes text and byte encoded data.
Viewing byte ports as a subtype of character ports is consistent with
current practice (i.e. "text files" are just binary files which
encode the characters with a sequence of bytes that depend on the
For your bidi ports, perhaps
open-input-output-file string [encoding keywords] -> input-char-
with the two ports sharing common buffer structure in the
It is a pain to carry those two ports around in the code when the
program needs to communicate bidirectionally with some other entity
(another process, a user at a terminal, a socket, etc). Moreover the
separation of a conceptually bidirectional channel into distinct
ports (input and output) destroys the conceptual link that they
have. This hinders program understanding. For example, with
bidirectional ports (close-port port) will close both sides of the
bidirectional port (i.e. the link between the input and output port
is preserved). With two unidirectional ports you have to duplicate
some operations (closing ports, changing port settings, ...).
Often one needs to open a file or a structure initially as a byte
port, then decode subsequent sections of the sequence as characters
of a particular encoding. For that, a procedure like
cook-input-encoding integer input-byte-port [encoding keywords] ->
can return a port that promises to decode a certain number of
octets from the backing byte port with your encoding. It does't
handle variable-length structures well, though.
This is possible with SRFI 91. Just open the file (in buffered or
non-buffered mode) and read your bytes, then read your characters.
If you need to read the characters first, then the file needs to be
opened in non-buffered mode, read your characters, then read your
bytes (after switching back to buffered mode if you wish).
By the way I'm tempted to add string ports to this SRFI (compatible
with SRFI 6 of course), and the analog ports for u8vectors, i.e.
u8vector ports. String ports are character ports (but not byte
ports) and u8vector ports are byte ports (and character ports).
Something along these lines:
These would allow a more complete set of procedures for encoding and
decoding strings into u8vectors. For example:
(list char-encoding: 'UTF-8)
(lambda () (write-char (integer->char 1234))))
I'm currently holding back to keep the SRFI lean, but I may change my
mind (or write a separate SRFI).
I like your read-substring and write-substring.