[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are byte ports "ports" as such?

This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.



On 13-Apr-06, at 1:54 PM, Ben Goetter wrote:

If you separate byte ports from character ports, and separate input ports from output ports (at least at the API level), you get an easily type-checked interface. e.g.

open-input-file string [encoding keywords] -> input-character-port
read-char input-char-port -> character
open-input-file-raw string -> input-byte-port
read-byte input-byte-port -> integer


Did you read this section of the SRFI?

Byte ports support character I/O operations because with each byte port is attached a character encoding specifying how characters are encoded with bytes. It is incorrect to believe however that all ports are byte ports. For example the ``string ports'' of SRFI 6 (Basic String Ports) have no reason to be aware of the character to byte encoding because they only deal with sequences of characters. So they need not be byte ports. For this reason this SRFI views byte ports as a subtype of character ports. Character ports support character I/O operations and byte ports support character I/O operations and byte I/ O operations. All I/O operations which are valid on a character port are also valid on a byte port. [Although not specified in this SRFI a further generalization is ``object ports'' which are ports whose fundamental I/O unit is the Scheme object. Character ports are object ports because there is a standard encoding of (most) Scheme objects to characters.]

SRFI 91 allows character I/O and binary I/O on byte ports because often files use a format which mixes text and byte encoded data. Viewing byte ports as a subtype of character ports is consistent with current practice (i.e. "text files" are just binary files which encode the characters with a sequence of bytes that depend on the character encoding).

For your bidi ports, perhaps

open-input-output-file string [encoding keywords] -> input-char- port output-char-port

with the two ports sharing common buffer structure in the implementation.


It is a pain to carry those two ports around in the code when the program needs to communicate bidirectionally with some other entity (another process, a user at a terminal, a socket, etc). Moreover the separation of a conceptually bidirectional channel into distinct ports (input and output) destroys the conceptual link that they have. This hinders program understanding. For example, with bidirectional ports (close-port port) will close both sides of the bidirectional port (i.e. the link between the input and output port is preserved). With two unidirectional ports you have to duplicate some operations (closing ports, changing port settings, ...).

Often one needs to open a file or a structure initially as a byte port, then decode subsequent sections of the sequence as characters of a particular encoding. For that, a procedure like

cook-input-encoding integer input-byte-port [encoding keywords] -> input-char-port

can return a port that promises to decode a certain number of octets from the backing byte port with your encoding. It does't handle variable-length structures well, though.


This is possible with SRFI 91. Just open the file (in buffered or non-buffered mode) and read your bytes, then read your characters. If you need to read the characters first, then the file needs to be opened in non-buffered mode, read your characters, then read your bytes (after switching back to buffered mode if you wish).

By the way I'm tempted to add string ports to this SRFI (compatible with SRFI 6 of course), and the analog ports for u8vectors, i.e. u8vector ports. String ports are character ports (but not byte ports) and u8vector ports are byte ports (and character ports). Something along these lines:

(open-input-string string-or-settings)
(open-output-string [string-or-settings])
(open-string [string-or-settings])

and

(open-input-u8vector u8vector-or-settings)
(open-output-u8vector [u8vector-or-settings])
(open-u8vector [u8vector-or-settings])

These would allow a more complete set of procedures for encoding and decoding strings into u8vectors. For example:

> (with-output-to-u8vector
    (list char-encoding: 'UTF-8)
    (lambda () (write-char (integer->char 1234))))
#u8(211 146)

I'm currently holding back to keep the SRFI lean, but I may change my mind (or write a separate SRFI).

I like your read-substring and write-substring.

Great.

Marc