Title

Port I/O

Author

Michael Sperber

Status

This SRFI is currently in withdrawn status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-81@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2005-10-08
Draft: 2005-11-24
Withdrawn: 2006-11-20

Abstract

This SRFI defines an I/O layer similar in nature to the ports subsystem in R5RS, and provides conventional, imperative buffered input and output.

The layer architecture is similar to the upper three layers of the I/O subsystem in The Standard ML Basis Library.

In particular, the subsystem fulfills the following requirements:

buffered reading and writing
binary and text I/O, mixed if needed
the ability to create arbitrary I/O ports from readers and writers

It builds on the Primitive I/O layer specified in SRFI 79 (Primitive I/O).

Rationale

This SRFI is meant as a compelling replacement for the R5RS I/O subsystem.

The design of this SRFI is driven by the requirements mentioned in the abstract on the one hand. Moreover, it is meant to fully integrate into a three-layer I/O subsystem where ports can be built on top of streams, as specified in SRFI 80 (Stream I/O). Ports can be implemented independently, however.

Specification

Prerequisites

This SRFI is meant for Scheme implementations with the following prerequisites:

Unicode support

This SRFI assumes that the char datatype in Scheme corresponds to Unicode scalar values. This, in turn, means that strings are represented as vectors of scalar values. (Note that this is consistent with SRFI 14 (Character-set library) and SRFI 75 (R6RS Unicode data).) It may be possible to make this SRFI work in an ASCII- or Latin-1-only system, but I have not made any special provisions to ensure this.

Filenames

Filenames in this SRFI are the same as in SRFI 79 (Primitive I/O).

General remarks

For procedures that have no "natural" return value, this SRFI often uses the sentence

The return values are unspecified.

This means that number of return values and the return values are unspecified. However, the number of return values is such that it is accepted by a continuation created by begin. Specifically, on Scheme implementations where continuations created by begin accept an arbitrary number of arguments (this includes most implementations), it is suggested that the procedure return zero return values.

Blobs

The specification frequently refers to blobs. These are as specified in SRFI 74 (Octet-Addressed Binary Blocks).

File options

File options are as in SRFI 79 (Primitive I/O).

Buffer modes

Each output port has an associated buffer mode that defines when an output operation will flush the buffer associated with the output stream. The possible buffer modes are none for no buffering, line for flushing upon newlines, and block for block-based buffering.

While this SRFI does not require buffer modes to form a distinct type, implementors are encouraged to make them a distinct type.

In systems implementing SRFI 80 (Stream I/O), the buffer modes may or may not be identical to those defined here.

(buffer-mode name) (syntax): Name must be one of the identifiers none, line, and block. This returns a buffer-mode object denoting the associated buffer mode. There is only one such object for each mode, so a program can compare them using eq?.
(buffer-mode? obj): This returns #t if the argument is a buffer-mode object, #f otherwise.

Text Transcoders

This part of the SRFI provides pre-packaged functionality for encoding and decoding text in some common encodings. A transcoder is an opaque object encapsulating a specific text encoding. This SRFI specifies how to obtain a transcoder given a text encoder/decoder (or codec for short) and a specified newline encoding.

In systems implementing SRFI 80 (Stream I/O), the transcoders specified here may or may not be identical to those defined there.

(transcoder (codec codec) (eol-style eol-style)) (syntax)

This constructs a transcoder object from a specified codec and a specified end-of-line style. The codec and the eol-style clauses are both optional. If present, codec and eol-style, must be expressions that evaluate to a codec and an eol-style object, respectively. If not present, the codec defaults to "no codec" (corresponding to UTF-8), and the eol-style object defaults to the platform's standard EOL convention.

Any operands to a transcoder form that do not match the above syntax may be platform-specific extensions. The implementation is free to ignore them, but must not signal an error.

(update-transcoder old (codec codec) (eol-style eol-style)) (syntax)

This form returns a new transcoder object constructed from an old one, with the codec and eol-style fields replaced by the specified values. (Again, the codec and the eol-style clauses are both optional. Also, unrecognized operands can be ignored, but cannot signal an error.)

(eol-style lf) (syntax)

(eol-style crlf) (syntax)

(eol-style cr) (syntax)

These forms evaluate to end-of-line-style objects - lf stands for using U+000A, crlf stands for using U+000D U+000A, and cr stands for using U+000D as end-of-line.

latin-1-codec

utf-16le-codec

utf-16be-codec

utf-32le-codec

utf-32be-codec

These are predefined codecs for the ISO8859-1, UTF-16LE, UTF-16BE, UTF32-LE, and UTF-32BE encodings.

Input and Output Ports

This SRFI provides buffered I/O based on ports. Ports, like the streams in SRFI 80 (Stream I/O), allow buffered I/O on the underlying data sources and destinations. Input ports, like output ports, are imperative; a read operation destructively removes data from the port. The port layer is very similar, but not identical, to the R5RS I/O system.

The Port I/O layer introduces one condition type of its own.

(define-condition-type &i/o-port-error &i/o-error
  i/o-port-error?
  (port i/o-error-port))

This condition type allows specifying with what particular port an I/O error is associated. The port field has purely informational purpose. Conditions raised by Port I/O procedures may include an &i/o-port-error condition, but are not required to do so.

The specifications of some of the procedures that return ports contain the sentence:

The input (or output) port may or may not be a stream port.

This is a reference to SRFI 82 (Stream Ports) and ensures compatibility with that SRFI. However, this SRFI in no way depends on SRFI 82.

Input ports

(input-port? obj)

This returns #t if the argument is an input port, #f otherwise.

(read-blob-some input-port)

If any data is available in input-port before the next end of file, this returns a freshly allocated blob of non-zero size containing that data, and updates input-port to point exactly past the data read. If an end of file has been reached, it returns #f, and the input port is updated to point just past the end of file.

(read-u8 input-port)

If an octet is available before the next end of file, this returns that octet as an exact integer, and updates input-port to point exactly past the octet read. If an end of file has been reached, it returns #f, and the input port is updated to point just past the end of file.

(read-blob-n input-port n)

N must be an exact, non-negative integer, specifying the number of octets to be read. This tries to read n octets. If n or more octets are available before the next end of file, it returns a blob of size n. If fewer octets are available before the next end of file, it returns a blob containing those octets. Subsequently, the input port is updated to point exactly past the data read. If end of file has been reached, this returns #f, and the input port is updated to point just past the end of file.

(read-blob-n! input-port blob start count)

Count must be an exact, non-negative integer, specifying the number of octets to be read. Blob must be a blob with at least (+ start count) elements. This tries to read count octets. If count or more octets are available before the next end of file, they are written into blob starting at index start, and it returns count. If fewer octets are available before the next end of file, it writes the available octets into blob starting at index start, and it returns the number of octets actually read. In either case, the input port is updated to point exactly past the data read. If end of file has been reached, this returns #f, and it updates the input port to point just past the end of file. This procedure will block until either data is available or end of file is reached.

(read-blob-all input-port)

If data is available before the next end of file, this returns a blob containing all octets until that end of file. If not, read-blob-all returns #f. The input port is updated to point just past the end of file. Note that this function may block indefinitely on ports connected to interactive devices, even though data is available.

(read-string input-port)

If any data representing a string is available before the next end of file, this returns a string of non-zero size containing the UTF-8 decoding of that data. The input port is updated to point just past the data read. If an end of file has been reached, it returns #f, and the input port is updated to point just past the end of file. This procedure will block until either data is available or end of file is reached.

(read-char input-port)

If a char is available before the next end of file, this returns that char, and the input port is updated to point past the data read. If an end of file has been reached, this returns #f, and the input code returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(read-string-n input-port n)

N must be an exact, non-negative integer, specifying the number of chars to be read. It tries to read n chars. If n or more chars are available before the next end of file, it returns a string of size n consisting of those chars. If fewer chars are available before the next end of file, it returns a string containing those chars. In either case, the input port is updated to point exactly past the data read. If end of file has been reached, it returns #f, and the input port is updated to point just past the end of file. This procedure will block until either data is available or end of file is reached.

(read-string-n! input-port string start count)

Count must be an exact, non-negative integer, specifying the number of chars to be read. The input stream returned points exactly past the data read. It tries to read count chars. If count or more chars are available before the next end of file, they are written into string starting at index start, and it returns count as the value. If fewer chars are available before the next end of file, it writes the available chars into string starting at index start, and it returns the number of chars actually read as the value. If end of file has been reached, it returns #f, and the input port is updated to point just past the end of file. This procedure will block until either data is available or end of file is reached.

(read-string-all input-port)

If data is available before the next end of file, the value returned is a string containing all the text until the next end of file. If no data is available, the value is #f. The input port is updated to point just past the end of file. Note that this function may block indefinitely on interactive ports.

(peek-u8 input-port)

This is the same as read-u8, but does not advance the port.

(peek-char input-port)

This is the same as read-char, but does not advance the port.

(port-eof? input-port)

Returns #t if the port is currently pointing at an end-of-file, #f otherwise.

(input-port-position input-port)

This returns the octet position corresponding to the next octet read from the input port. This procedure raises an &i/o-operation-not-available-error condition if the port does not support the operation. It is an error to apply this procedure to a closed port, a transcoded port, or a stream input port on a truncated stream, or a translated stream.

(set-input-port-position! input-port pos)

Pos must be a non-negative exact integer. This sets the current octet position of input-port to pos. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a closed port, a transcoded port, or a stream output port on a terminated stream, or a translated stream.

(transcode-input-port! input-port transcoder)

This transcodes input-port according to the encoding specified by transcoder, assuming input-port was previously not transcoded. The port will henceforth translate the data arriving at the port into UTF-8 with end-of-line encoded by U+000A.

It is an error for input-port to be transcoded upon the call to transcode-input-port!.

(close-input-port input-port)

This closes input-port, rendering the port incapable of accepting data. This has no effect if the port has already been closed. The return values are unspecified.

(open-file-input-port filename)

(open-file-input-port filename file-options)

(open-file-input-port filename file-options transcoder)

This returns an input port for the named file. The input port may or may not be a stream port. The file-options object defaults to (file-options)if not present. It may determine various aspects of the returned port, see the section on file options. If a transcoder transcoder is specified, the port is appropriately transcoded.

(open-blob-input-port blob)

(open-blob-input-port blob transcoder)

This returns an input port, associated with a blob stream on the blob blob. The input port may or may not be a stream port. If a transcoder transcoder is specified, the port is appropriately transcoded.

(open-string-input-port string)

(open-string-input-port string transcoder)

This returns an input port, associated with a blob stream on the UTF-8 encoding of string string. The input port may or may not be a stream port. If a transcoder transcoder is specified, the port is appropriately transcoded.

(call-with-input-port input-port proc)

This calls proc with input-port as an argument. If proc returns, then the port is closed automatically and the values returned by proc are returned. If proc does not return, then the port will not be closed automatically, unless it is possible to prove that the port will never again be used for a read operation.

(standard-input-port)

Returns an input port connected to standard input, possibly a fresh one on each call. Note that a program that may run in a system where other programs also read concurrently from a port returned by standard-input-port should not keep the returned port object live for too long without reading from it. Specifically, it may be a stream port connected to a standard-input stream, and standard input read in other parts of the program may accumulate buffer space.

Output ports

(output-port? obj): This returns #t if the argument is an output port, #f otherwise.
(write-blob output-port blob)
(write-blob output-port blob start)
(write-blob output-port blob start count): Start and count must be non-negative exact integers that default to 0 and (- (blob-length blob) start), respectively. This writes the count octets in blob blob starting at index start to the output port. It is an error if the blob actually has size less than start + count. The return values are unspecified.

(write-u8 output-port octet)

This writes the octet octet to the output port. The return values are unspecified.

(write-string-n output-port string)

(write-string-n output-port string start)

(write-string-n output-port string start count)

Start and count must be non-negative exact integers. Start defaults to 0. Count defaults to (- (string-length blob) start). This writes the UTF-8 encoding of the substring (substring string (+ start count)) to the port. The return values are unspecified.

(write-char output-port char)

This writes the UTF-8 encoding of the char char to the port. The return values are unspecified.

(newline output-port)

This is equivalent to (write-char #\newline output-port). The return values are unspecified.

(flush-output-port output-port)

This flushes any output from the buffer of output-stream to the underlying data or device. The return values are unspecified.

(output-port-buffer-mode output-port)

This returns the buffer-mode object of output-port.

(set-output-port-buffer-mode! output-port buffer-mode)

If the current buffer mode of output-port is something other than none and buffer-mode is the none buffer-mode object, this will first flush the output port. Then, it sets the buffer-mode object associated with output-port to buffer-mode. The return values are unspecified.

(output-port-position output-port)

This returns the position corresponding to the next octet written to the output port. This procedure raises an &i/o-operation-not-available-error condition if the port does not support the operation. It is an error to apply this procedure to a closed port, a transcoded port, or a stream output port on a terminated stream, or a translated stream.

(set-output-port-position! output-port pos)

Pos must be a non-negative exact integer. This flushes the output port and sets its current octet position to pos. This procedure raises an &i/o-operation-not-available-error condition if the port does not support the operation. It is an error to apply this procedure to a closed port, a transcoded port, or a stream output port on a terminated stream, or a translated stream.

(transcode-output-port! output-port transcoder)

This transcodes output-port, translating the data fed into output-port into the encoding specified by transcoder, assuming it is encoded as UTF-8 with end-of-line encoded by U+000A.

This assumes output-port was previously not transcoded. It is an error for output-port to be transcoded upon the call to transcode-output-port!.

(close-output-port output-port)

This closes output-port, rendering the port incapable of delivering data. This has no effect if the port has already been closed. The return values are unspecified.

(open-file-output-port filename)

(open-file-output-port filename file-options)

(open-file-output-port filename file-options transcoder)

This returns an output port for the named file and the specified options (which defaults to file-options.) The output port may or may not be a stream port. If a transcoder transcoder is specified, the port is appropriately transcoded.

(call-with-blob-output-port proc)

(call-with-blob-output-port proc transcoder)

Proc is a procedure accepting one argument. This creates an unbuffered output port connected to a blob writer, and calls proc with that output port as an argument. The output port may or may not be a stream port. The call to call-with-blob-output-port returns the blob associated with the port when proc returns. If a transcoder transcoder is specified, the port is appropriately transcoded.

(call-with-string-output-port proc)

(call-with-string-output-port proc transcoder)

Proc is a procedure accepting one argument. This creates an unbuffered output connected to a blob writer, and calls proc with that port as an argument. The output port may or may not be a stream port. The call to call-with-string-output-port returns the UTF-8 decoding of the blob associated with the port when proc returns. If a transcoder transcoder is specified, the port is appropriately transcoded.

(call-with-output-port output-port proc)

This calls proc with output-port as an argument. If proc returns, then the port is closed automatically and the values returned by proc are returned. If proc does not return, then the port will not be closed automatically, unless it is possible to prove that the port will never again be used for a write operation.

(standard-output-port)

(standard-error-port)

Returns a port connected to the standard output or standard error, respectively.

Opening files for reading and writing

(open-file-input+output-ports filename)
(open-file-input+output-ports filename file-options)
(open-file-input+output-ports filename file-options transcoder): This returns two values, an input port and an output port for the named file and the specified options (which defaults to (file-options).) The ports may or may not be stream ports. If a transcoder transcoder is specified, the ports are appropriately transcoded.

Ports from readers and writers

(open-reader-input-port reader)
(open-reader-input-port reader transcoder): This returns an input port connected to the reader reader.If a transcoder transcoder is specified, the port is appropriately transcoded.
(open-writer-output-port writer buffer-mode)
(open-writer-output-port writer buffer-mode transcoder): This returns an output port connected to the writer writer with buffering according to buffer-mode.If a transcoder transcoder is specified, the port is appropriately transcoded.

Design rationale

Encoding

Many I/O system implementations allow associating an encoding with a port, allowing the direct use of several different encodings with ports. The problem with this approach is that the encoding/decoding defines a mapping from binary data to text or vice versa. Because of this asymmetry, such mappings do not compose. The result is usually complications and restrictions in the I/O API, such as the inability to mix text or binary data.

This SRFI avoids this problem by specifying that textual I/O always uses UTF-8. This means that, if the target or source of an I/O port is to use a different encoding, a translated port needs to be used, for which this SRFI offers the required facilities. This means that text decoders or encoders are expressed as binary-to-binary mappings, and as such compose.

`display` vs `write`

R5RS calls the procedures for writing something to an output port write-<something>. In a previous revision of this SRFI, all were called display-<something>. R5RS doesn't offer a consistent rule for naming, as the display and write-char procedures behave identically on character arguments, wherease write and write-char do not.

Historically, it seems that the original proposal for the I/O subsystem in RnRS indeed called the procedure display-char. I do not know why it was renamed---probably for compatibility with Common Lisp, which also has write-char.

While the procedures in this SRFI follow a consistent naming scheme, consistency is an issue for what's read and writein R5RS. The naming scheme proposed here suggests they be called read-datum and write-datum.

`char-ready?`

This SRFI intentionally does not provide char-ready?, which is part of R5RS. The original intention of the procedure seems to have been to interface with something like Unix select(2). With multi-byte encodings such as UTF-8, this is no longer sufficient: the procedure would really have to look at the actual input data in order to determine whether a complete character is actually present. This makes realistic implementations of char-ready? inconsistent with the user's expectations. A procedure byte-ready? would be more consistent. On the other hand, such a procedure is rarely useful in real-world programs, hard to specify to the point where it would be portably usable, and complicates all layers of the I/O system, as readers would have to provide an additional member procedure to enable its implementation. Moreover, a select(2)-like implementation is not possible on all plattforms and all types of ports. Consequently, char-ready? and byte-ready? are not part of this SRFI.

`display`

This SRFI does not provide display, which is part of R5RS. Display is woefully underspecified, and mostly used for debug output. It seems display should be replaced by a procedure for formatted output, possibly augmented by handling of dynamically-bound "current ports".

Optional ports and argument order

The argument order of the procedures in this SRFI is different from R5RS: The port is always at the beginning, and it is mandatory.For a rationale, see the message by Taylor Campbell on the subject.

No distinct end of file object

In R5RS, the distinct type of end of file objects is primarily for the benefit of read, where end of file must be denoted by an object that read cannot normally return as a result of parsing the input. However, it does not seem necessary to drag in the complications of this separate object into the other I/O operations, where #f is perfectly adequate to represent end of file.

Reference Implementation

Here is a tarball containing a reference implementation of this SRFI, along with implementations for SRFI 79 (Primitive I/O), SRFI 80 (Stream I/O), and SRFI 82 (Stream-Port I/O). It only runs on a version of Scheme 48 that has not been released at the time of writing in this SRFI.

However, its actual dependencies on Scheme 48 idiosyncracies are few. Chief are its use of the module system, which is easily replaced by another, and the implementation of Unicode. To implement primitive readers and writers on files, the code only relies on suitable library procedures to open the files, and read-byte and write-byte procedures to read or write single bytes from a (R5RS) port, as well as a force-output procedure to flush a port.

The reference implementation has not been highly tuned, but I have spent a modest amount of time making the code deal with buffers in an economic manner. Because of this, the code is more complicated than it needs to be, but hopefully also more usable as a basis for implementing this SRFI in actual Scheme systems.

Examples

Many examples are adapted from The Standard ML Basis Library edited by Emden R. Gansner and John H. Reppy. Cambrige University Press, 2004.

The code makes liberal use of SRFIs SRFI 1 (List Library), SRFI 11 (Syntax for receiving multiple values), SRFI 26 (Notation for Specializing Parameters without Currying).

The tarball with the reference implementation contains these examples along with test cases for them.

(define three-lines-string
  (call-with-string-output-port
   (lambda (port)
     (write-string port "foo") (newline port)
     (write-string port "bar") (newline port)
     (write-string port "baz") (newline port))))

Read a file directly:

(define (get-contents filename)
  (call-with-input-port (open-file-input-port filename)
    read-blob-all))

Read a file octet-by-octet:

(define (get-contents-2 filename)
  (call-with-input-port (open-file-input-port filename)
    (lambda (port)
      (let loop ((accum '()))
        (let ((thing (read-u8 port)))
          (if (not thing)
              (list->blob (reverse accum))
              (loop (cons thing accum))))))))

(define (list->blob l)
  (let ((blob (make-blob (length l))))
    (let loop ((i 0) (l l))
      (if (null? l)
          blob
          (begin
            (blob-u8-set! blob i (car l))
            (loop (+ 1 i) (cdr l)))))))

Read file chunk-by-chunk:

(define (get-contents-3 filename)
  (call-with-input-port (open-file-input-port filename)
    (lambda (port)
      (let loop ((accum '()))
        (cond
         ((read-blob-some port)
          => (lambda (blob)
               (loop (cons blob accum))))
         (else
          (concatenate-blobs (reverse accum))))))))

(define (concatenate-blobs list)
  (let* ((size (fold + 0 (map blob-length list)))
         (result (make-blob size)))
    (let loop ((index 0)
               (blobs list))
      (if (null? blobs)
          result
          (let* ((b (car blobs))
                 (size (blob-length b)))
            (blob-copy! b 0 result index size)
            (loop (+ index size)
                  (cdr blobs)))))))

Acknowledgements

Sebastian Egner provided valuable comments on a draft of this SRFI. The posters to the SRFI 68 (Comprehensive I/O) provided many very valuable comments. Donovan Kolbly did thorough pre-draft editing. Any remaining mistakes are mine.

References

The Standard ML Basis Library edited by Emden R. Gansner and John H. Reppy. Cambrige University Press, 2004.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Donovan Kolbly