SRFI 68: Comprehensive I/O

Title

Comprehensive I/O

Authors

Michael Sperber

Status

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. It will remain in draft status until 2005/06/13, or as amended. To provide input on this SRFI, please


mailto:srfi-68@srfi.schemers.org

. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Received: 2005/04/06
Draft: 2005/04/13 - 2005/06/13
Revised: 2005/06/14
Revised: 2005/07/27
Revised: 2005/08/08
Revised: 2005/09/21

Abstract

This SRFI defines a comprehensive I/O subsystem for Scheme with three layers, where each layer is built on top of the one below it:

The lowest, primitive layer provides unbuffered I/O, and is close to what a typical operating system offers.
The middle layer builds on lazy, mostly functional buffered streams.
The upper layer is similar in nature to the ports subsystem in R5RS, and provides conventional, imperative buffered input and output.

The layer architecture is similar to the upper three layers of the I/O subsystem in The Standard ML Basis Library.

In particular, the subsystem provides

buffered reading and writing
arbitrary lookahead at the streams level
dynamic redirection of input or output at the ports level
binary and text I/O, mixed if needed
translated data streams
unbuffered I/O at the primitive layer
the ability to create arbitrary I/O streams, such as to and from blobs and strings

The subsystem does not provide

formatted I/O
non-blocking or selective I/O
portable filenames, or any functionality for manipulating filenames
filesystem operations
socket I/O
extremely high-throughput or zero-copy I/O

However, all of these could be added on top of one or several of the layers specified in this SRFI.

Rationale

The I/O subsystem in R5RS is too limited in many respects: It only provides for text I/O, it only allows reading at the character and the datum level, and some of the primitives are mis-designed or underspecified. As a result, almost every Scheme implementation has its own extensions to the I/O system, and rarely are two of these extensions compatible.

This SRFI is meant as a compelling replacement for the R5RS I/O subsystem. As such, it is a completely new design, and it is not based on the extensions a particular existing Scheme system provides. (In fact, it is probably, in its entirety, unlike what any existing Scheme system provides.) Moreover, it is meant to be a substrate for further extensions which can be built on top of the subsystem via the interface described here.

The design of this SRFI is driven by the requirements mentioned in the abstract on the one hand, and on the excellent design of the I/O subsystem in the Standard ML Basis Library. The latter is also the reason why this SRFI is different from the extensions provided by any existing Scheme implementation, as none of them picked up on the Basis design, and because the Basis design seems superior to the extensions I have looked at. (Among those I have looked at are Scheme 48, scsh, Gambit-C, and PLT Scheme.)

Note, however, that this SRFI differs from the SML Basis in several important respects, among them the handling of textual I/O streams, the ability to define translated streams, and the absence of any functionality related to non-blocking I/O. The latter is more properly in the domain of a thread/even system; Concurrent ML shows that the SML Basis plays well with such a system, and I expect the same to hold true here. The text encoding/translation functionality is different mainly because it plays to a different substrate for representing text (based on Unicode; see below) than Standard ML.

Like the Standard ML Basis I/O subsystem, the I/O system specified in this SRFI is probably not suitable for maximal-throughput I/O, chiefly because it does not re-use buffers, and because the buffer objects are blobs, with no further constraints on alignment or GC behavior. However, I deemed the achievable performance as more than adequate for most applications---it seemed a small price to pay for the convenient programming model.

Specification

Prerequisites

This SRFI is meant for Scheme implementations with the following prerequisites:

support for SRFIs 34 (Exception Handling for Programs), 35 (Conditions), 74 (Octet-Addressed Binary Blocks)
Unicode support

Unicode support

This SRFI assumes that the char datatype in Scheme to correspond to Unicode scalar values. This, in turn, means that strings are represented as vectors of scalar values. (Note that this is consistent with SRFI 14 (Character-set library) and SRFI 75 (R6RS Unicode data).) It may be possible to make this SRFI work in an ASCII- or Latin-1-only system, but I have not made any special provisions to ensure this.

Filenames

Some of the procedures described here accept a filename filename as an argument. Valid values for such a filename include strings naming a file using the native notation of the operating system the Scheme implementation happens to be running on.

It is expected that a future SRFI will extend this set of values by a more abstract representation: This is necessary, as the most common operating systems do not really use strings for representing filenames, but rather octet sequences. Moreover, the string notation is difficult to manipulate and not very portable.

General remarks

For procedures that have no "natural" return value, this SRFI often uses the sentence

The return values are unspecified.

This means that number of return values and the return values are unspecified. However, the number of return values is such that it is accepted by a continuation created by begin. Specifically, on Scheme implementations where continuations created by begin accept an arbitrary number of arguments (this includes most implementations), it is suggested that the procedure return zero return values.

Organization

The I/O subsystem consists of three layers. Each layer can function independently from those above it. Moreover, each layer can be used without referring directly to the ones below it. Therefore, a Scheme implementation with a module system should offer each layer as an independent module. Moreover, some data extensions are common to all three I/O layers---specifically, the common I/O condition types.

Data extensions

Condition types

The I/O conditition type hierarchy here is similar, but not identical to the one described in 36 (I/O Conditions).

The following list depicts the I/O condition hierarchy; more detailed explanations of the condition types follow.

&error
- &i/o-error
  - &i/o-operation-error (has an operation field)
    - &i/o-operation-not-available-error
  - &i/o-read-error
  - &i/o-write-error
  - &i/o-closed-error
  - &i/o-invalid-position-error
  - &i/o-filename-error (has a filename field)
    - &i/o-malformed-filename-error
    - &i/o-file-protection-error
      - &i/o-file-is-read-only-error
    - &i/o-file-already-exists-error
    - &i/o-no-such-file-error

In exceptional situations not described as "it is an error", the procedures described in the specification below will raise an &i/o-error condition object. Except where explicitly specified, there is no guarantee that the raised condition object will contain all the information that would be applicable. It is recommended, however, that an implementation of this SRFI provide all information about an exceptional situation in the condition object that is available at the place where it is detected.

(define-condition-type &i/o-error &error i/o-error?): This is a supertype for a set of more specific I/O errors.
(define-condition-type &i/o-operation-error &i/o-error i/o-operation-error? (operation i/o-error-operation)): This condition type specifies an I/O error that occurred during an specific operation. Condition objects belonging to this type must specify the procedure that was called to perform the operation in the operation field.
(define-condition-type &i/o-operation-not-available-error &i/o-operation-error i/o-operation-not-available-error?): This condition type indicates that the program tried to perform an I/O operation that was not available.
(define-condition-type &i/o-read-error &i/o-error i/o-read-error?): This condition type specifies a read error that occurred during an I/O operation.
(define-condition-type &i/o-write-error &i/o-error i/o-write-error?): This condition type specifies a write error that occurred during an I/O operation.
(define-condition-type &i/o-invalid-position-error &i/o-error i/o-invalid-position-error? (position i/o-error-position)): This condition type specifies that an attempt to set the file position specified an invalid position.
(define-condition-type &i/o-closed-error &i/o-error i/o-error?): A condition of this type specifies that an operation tried to operate on a closed I/O object under the assumption that it is open.
(define-condition-type &i/o-filename-error &i/o-error i/o-filename-error? (filename i/o-error-filename)): This condition type specifies an I/O error that occurred during an operation on a named file. Condition objects belonging to this type must specify a file name in the filename field.
(define-condition-type &i/o-malformed-filename-error &i/o-filename-error i/o-malformed-filename-error?): This condition type indicates that a file name had an invalid format.
(define-condition-type &i/o-file-protection-error &i/o-filename-error i/o-file-protection-error?): A condition of this type specifies that an operation tried to operate on a named file with insufficient access rights.
(define-condition-type &i/o-file-is-read-only-error &i/o-file-protection-error i/o-file-is-read-only-error?): A condition of this type specifies that an operation tried to operate on a named read-only file under the assumption that it is writeable.
(define-condition-type &i/o-file-already-exists-error &i/o-filename-error i/o-file-already-exists-error?): A condition of this type specifies that an operation tried to operate on an existing named file under the assumption that it does not exist.
(define-condition-type &i/o-file-exists-not-error &i/o-filename-error i/o-file-exists-not-error?): A condition of this type specifies that an operation tried to operate on an non-existent named file under the assumption that it exists.

Buffer modes

Each output stream and each output port has an associated buffer mode that defines when an output operation will flush the buffer associated with the output stream. The possible buffer modes are none for no buffering, line for flushing upon newlines, and block for block-based buffering.

(buffer-mode name) (syntax): Name must be one of the identifiers none, line, and block. This returns a buffer-mode object denoting the associated buffer mode. There is only one such object for each mode, so a program can compare them using eq?.
(buffer-mode? obj): This returns #t if the argument is a buffer-mode object, #f otherwise.

File Options

When opening a file for output, the various procedures in this SRFI accept a file-options object containing a set of flags that specify how the file is to be opened:

(file-options file-options-name...) (syntax)

The syntax file-options returns a file-options object with the specified options set. The following options may be used:

`create`	create file if it does not already exist
`exclusive`	an error will be raised if this option and `create` are both set and the file already exists
`truncate`	file is truncated
`append`	writes are appended to existing contents

Any options not in this list may be platform-specific extensions. The implementation is free to ignore them, but must not signal an error.

(file-options-include? file-options-1 file-options-2)

This returns #t if file-options-1 includes all of the flags listed in file-options-2, #f otherwise.

(file-options? obj)

This returns #t if the argument is a file-options object, #f otherwise.

(file-options-union file-options ...)

This returns a file-options object containing all the flags of its arguments.

Primitive I/O

The Primitive I/O layer is an abstraction of the low-level I/O system calls commonly available on file descriptors: Streams always can perform access through the abstractions provided by this layer. The objects representing I/O descriptors are called readers for input and writers for output. They are unbuffered and operate purely on binary data.

This layer only specifies a fairly small set of operations --- a subset of the Standard ML Basis PRIM_IO signature. Specifically, all functionality related to non-blocking I/O or polling is missing here. This is intentional, as this functionality should be integrated with the threads system of the underlying implementation, and is thus outside the scope of this (already large) SRFI. Instead, it is expected that the set of operations available on primitive I/O readers and writers will be augmented by future specifications, as will be the available constructors for these objects.

The Primitive I/O layer introduces one condition type of its own.

(define-condition-type &i/o-reader/writer-error &i/o-error i/o-reader/writer-error? (reader/writer i/o-error-reader/writer)): This condition type allows specifying the particular reader or writer with which an I/O error is associated. The reader/writer field has purely informational purpose. Conditions raised by Primitive I/O procedures may include an &i/o-reader/writer-error condition, but are not required to do so.

I/O buffers

(make-i/o-buffer size): This creates a blob of size size with undefined contents. Callers of the Primitive I/O procedures are encouraged to use blobs created by make-i/o-buffer because they might have alignment and placement characteristics that make reader-read! and writer-write! more efficient. (These procedures are still required to work on regular blobs, however.)

Readers

A reader object typically stands for a file or device descriptor, but can also represent the output of some algorithm, such as in the case of string readers. The sequence of octets represented is potentially unbounded, and is punctuated by end of file elements.

(reader? obj)

Returns #t if obj is a reader, otherwise returns #f.

(make-simple-reader id descriptor chunk-size read! available get-position set-position! end-position close)

Returns a reader object. Id is a string naming the reader, provided for informational purposes only. For a file, this will be a string representation of the file name. Descriptor is supposed to be the low-level object connected to the reader, such as the OS-level file descriptor or the source string in the case of a string reader.

Chunk-size must be a positive exact integer, and is the recommended efficient size of the read operations on this reader. This is typically the block size of the buffers of the operating system. Note that this is just a recommendation --- calls to the read! procedure (see below) will not necessarily use it. A value of 1 represents a recommendation to use unbuffered reads.

The remaining arguments are procedures --- get-position, set-position!, and end-position may be omitted, in which case the corresponding arguments must be #f.

(read! blob start count): Start and count must be non-negative exact integers. This reads up to count octets from the reader and writes them into blob, which must be a blob, starting at index start. blob must have at least start + count elements. This procedure returns the number of octets read as an exact integer. It returns 0 if it encounters an end of file, or if count is 0. This procedure blocks until at least one octet has been read or it has encountered end of file.
(available): This returns an estimate of the total number of available octets left in the stream. The return value is either an exact integer, or #f if no such estimate is possible. There is no guarantee that this estimate will have any specific relationship to the true number of available octets.
(get-position): This procedure, when present, returns the current position in the octet stream as an exact integer counting the number of octets since the beginning of the stream.
(set-position! pos): This procedure, when present, moves to position pos (which must be a non-negative exact integer) in the stream.
(end-position): This procedure, when present, returns the position in the octet stream of the next end of file, without changing the current position.
(close): This procedure marks the reader as closed, performs any necessary cleanup, and releases the resources associated with the reader. Further operations on the reader may signal an error.

(reader-id reader)

This returns the value of the id field of the argument reader.

(reader-descriptor reader)

This returns the value of the descriptor field of the argument reader.

(reader-chunk-size reader)

This returns the value of the chunk-size field of the argument reader.

(reader-read! reader blob start count)

This calls the read! procedure of reader with the remaining arguments.

(reader-available reader)

This calls the available procedure of reader.

(reader-has-get-position? reader)

This returns #t if reader has a get-position procedure, and #f otherwise.

(reader-get-position reader)

This calls the get-position procedure of reader, if present. It is an error to call this procedure if reader does not have a get-position procedure.

(reader-has-set-position!? reader)

This returns #t if reader has a set-position! procedure, and #f otherwise.

(reader-set-position! reader pos)

This calls the set-position! procedure of reader with the pos argument, if present. It is an error to call this procedure if reader does not have a set-position! procedure.

(reader-has-end-position? reader)

This returns #t if reader has a end-position procedure, and #f otherwise.

(reader-end-position reader)

This calls the end-position procedure of reader, if present. It is an error to call this procedure if reader does not have a end-position procedure.

(reader-close reader)

This calls the close procedure of reader.

(open-blob-reader blob)

This returns a reader that uses a copy of blob, a blob, as its contents. This reader has get-position, set-position!, and end-position operations.

(open-file-reader filename)

This returns a reader connected to the file named by filename.This reader may or may not have get-position, set-position!, and end-position operations.

(standard-input-reader)

This returns a reader connected to the standard input. The meaning of "standard input" is implementation-dependent.

Writers

A writer object typically stands for a file or device descriptor, but can also represent the sink for the output of some algorithm, such as in the case of string writers.

(writer? obj)

Returns #t if obj is a writer, otherwise returns #f.

(make-simple-writer id descriptor chunk-size write! get-position set-position! end-position close)

Returns a writer object. Id is a string naming the writer, provided for informational purposes only. For a file, this will be a string representation of the file name. Descriptor is supposed to be the low-level object connected to the writer, such as the OS-level file descriptor.

Chunk-size must be a positive exact integer, and is the recommended efficient size of the write operations on this writer. This is typically the block size of the buffers of the operating system. Note that this is just a recommendation --- calls to the write!procedure (see below) will not necessarily use it. A value of 1 represents a recommendation to use unbuffered writes.

The remaining arguments are procedures --- get-position, set-position!, and end-position may be omitted, in which case the corresponding arguments must be #f.

(write! blob start count): Start and count must be non-negative exact integers. This writes up to count octets in blob blob starting at index start. Before it does this, it will block until it can write at least one octet. It returns the number of octets actually written as a positive exact integer.
(get-position): This procedure, when present, returns the current position in the octet stream as an exact integer counting the number of octets since the beginning of the stream.
(set-position! pos): This procedure, when present, moves to position pos (which must be a non-negative exact integer) in the stream.
(end-position): This procedure, when present, returns the octet position of the next end of file, without changing the current position.
(close): This procedure marks the writer as closed, performs any necessary cleanup, and releases the resources associated with the writer. Further operations on the writer may signal an error.

(writer-id writer)

This returns the value of the id field of the argument writer.

(writer-descriptor writer)

This returns the value of the descriptor field of the argument writer.

(writer-chunk-size writer)

This returns the value of the chunk-size field of the argument writer.

(writer-write! writer blob start count)

This calls the write! procedure of writer with the remaining arguments.

(writer-has-get-position? writer)

This returns #t if writer has a get-position procedure, and #f otherwise.

(writer-get-position writer)

This calls the get-position procedure of writer, if present. It is an error to call this procedure if writer does not have a get-position procedure.

(writer-has-set-position!? writer)

This returns #t if writer has a set-position! procedure, and #f otherwise.

(writer-set-position! writer pos)

This calls the set-position! procedure of writer with the pos argument, if present. It is an error to call this procedure if writer does not have a set-position! procedure.

(writer-has-end-position? writer)

This returns #t if writer has a end-position procedure, and #f otherwise.

(writer-end-position writer)

This calls the end-position procedure of writer, if present. It is an error to call this procedure if writer does not have a end-position procedure.

(writer-close writer)

This calls the close procedure of writer.

(open-blob-writer)

This returns a writer that can yield everything written to it as a blob. This writer has get-position, set-position!, and end-position operations.

(writer-blob writer)

The writer argument must be a blob writer returned by open-blob-writer. This procedure returns a newly allocated blob containing the data written to writer in sequence. Doing this in no way invalidates the writer or change its store.

(open-file-writer filename file-options)

This returns a writer connected to the file named by filename. The file-options object determines various aspects of the returned writer, see the section on file options. This writer may or may not have get-position, set-position!, and end-position operations.

(standard-output-writer)

This returns a writer connected to the standard output. The meaning of "standard output" is implementation-dependent.

(standard-error-writer)

This returns a writer connected to the standard error. The meaning of "standard error" is implementation-dependent.

Opening files for reading and writing

(open-file-reader+writer filename file-options): This returns a reader and a writer connected to the file named by filename. The file-options object determines various aspects of the returned writer, see the section on file options. This writer may or may not have get-position, set-position!, and end-position operations.
Note: This procedure enables opening a file for simultaneous input and output in environments where it is not possible to call open-file-reader and open-file-writer on the same file.

Stream I/O

The Stream I/O layer defines high-level I/O operations on two new datatypes: input streams and output streams. These operations include binary and textual I/O. Input streams are treated in lazy functional style: input from a stream s yields an object representing the input itself, and a new input stream s1. S will continue to represent exactly the same position within the input; to advance within the stream, the program needs to perform input from s1. Consequently, input streams allow arbitrary lookahead, which is especially convenient for all kinds of scanning.

Output streams are more conventional, as the lazy functional style does not make sense with output.

Both input streams and output streams are either directly connected to underlying readers and writers, or are defined by translation on an underlying stream. This makes it possible to perform trivial transformations such as CR/LF translation, but also transparent recoding on the streams.

Textual I/O always uses UTF-8 as the underlying encoding. Other encodings can easily be supported by translating to or from UTF-8 using the translation framework. If a decoding error occurs, the implicit decoder will skip the octet starting the character encoding, yield a ? character, and attempt to continue decoding after that initial octet.

The Stream I/O layer adds an additional condition type:

(define-condition-type &i/o-stream-error &i/o-error
  i/o-stream-error?
  (stream i/o-error-stream))

The stream field has purely informational purpose. Conditions raised in by Stream I/O procedures may include an &i/o-stream-error condition, but are not required to do so.

Input streams

Input streams come in two flavors: either directly based on a reader, or based on another input stream via translation. Input streams are in one of three states: active, truncated, or closed. When initially created, a stream is active. A program can retrieve the reader underlying an input stream---this automatically incurs disconnecting the stream from the reader, and puts the stream into the truncated state. When explicitly closed, the reader underlying an open input stream is closed as well. The closed state implies the truncated state.

Reading from a truncated stream is not an error; after all the existing buffers having been exhausted, the stream behaves as if an infinite sequence of end of files followed.

(input-stream? obj)

This returns #t if the argument is an input stream, #f otherwise.

(input-blob-some input-stream)

This returns two values: a value and another input stream. The input stream returned points exactly past the data read. If any data is available before the next end of file, this returns a freshly allocated blob of non-zero size containing that data. If an end of file has been reached, the value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-u8 input-stream)

This returns two values: a value and another input stream. The input stream returned points exactly past the data read. If an octet is available before the next end of file, this returns that octet as an exact integer. If an end of file has been reached, the value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-blob-n input-stream n)

N must be an exact, non-negative integer, specifying the number of octets to be read.This returns two values: a value and another input stream. It tries to read n octets. If n or more octets are available before the next end of file, it returns a blob of size n. If fewer octets are available before the next end of file, it returns a blob containing those octets. The input stream returned points exactly past the data read. If end of file has been reached, the return value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-blob-n! input-stream blob start count)

Count must be an exact, non-negative integer, specifying the number of octets to be read. Blob must be a blob with at least (+ start count) elements. This returns two values: a value and another input stream. It tries to read count octets. If count or more octets are available before the next end of file, they are written into blob starting at index start, and it returns count as the value. If fewer octets are available before the next end of file, it writes the available octets into blob starting at index start, and it returns the number of octets actually read as the value. The input stream returned points exactly past the data read. If end of file has been reached, the return value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-blob-all input-stream)

This returns two values: a value and another input stream. If data is available before the next end of file, the value is a blob containing all octets until that end of file. If not, the value is #f. The input stream returned points just past the end of file. Note that this function may block indefinitely on streams connected to interactive readers, even though data is available.

(input-string input-stream)

This returns two values: a value and another input stream. The input stream returned points exactly past the data read. If any data representing a string is available before the next end of file, this returns a string of non-zero size containing the UTF-8 decoding of that data as the first return value. If an end of file has been reached, it returns #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-char input-stream)

This returns two values: a value and another input stream. The input stream returned points exactly past the data read. If a char is available before the next end of file, this returns that char. If an end of file has been reached, the value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-string-n input-stream n)

N must be an exact, non-negative integer, specifying the number of chars to be read.This returns two values: a value and another input stream. The input stream returned points exactly past the data read. It tries to read n chars. If n or more chars are available before the next end of file, it returns a string of size nconsisting of those chars. If fewer chars are available before the next end of file, it returns a string containing those chars. If end of file has been reached, it returns #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-string-n! input-stream string start count)

Count must be an exact, non-negative integer, specifying the number of chars to be read.This returns two values: a value and another input stream. The input stream returned points exactly past the data read. It tries to read count chars. If count or more chars are available before the next end of file, they are written into string starting at index start, and it returns count as the value. If fewer chars are available before the next end of file, it writes the available chars into string starting at index start, and it returns the number of chars actually read as the value. If end of file has been reached, it returns #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-string-all input-stream)

This returns two values: a value and another input stream. If data is available before the next end of file, the value returned is a string contains all text until the next end of file. If no data is available, the value is #f. The input stream returned points just past the end of file. Note that this function may block indefinitely on streams connected to interactive readers, even though data is available.

(input-line input-stream)

This returns two values: a value and another input stream. If data is available before the next newline char, the value is a string that contains all text until the newline char. The input stream returned points just past the newline char.If end of file has been reached, the value is #f.

(stream-eof? input-stream)

This returns #t if the stream has reached end of file, #f otherwise.

(input-stream-position input-stream)

For reader-based input streams, this returns the reader position corresponding to the next octet read from the input stream. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a truncated or closed stream, or to a translated stream.

(input-stream-underliers input-stream)

Input-stream must be an open input stream. This returns two values. If the stream is translated, the first value is the underlying stream, and the second value is the translator procedure. If the stream is based on a reader, this returns the reader as the first value and #f as the second. Moreover, input-stream is put into the truncated state.

Note that, in the case of a translated stream, the returned underlying stream may already point past the data read by operations on input-stream due to buffering.

(input-stream-reader+constructor input-stream)

Input-stream must be an open input stream. This returns two values: a reader and a procedure of one argument. The reader is the underlying reader of the stream at the end of the chain of translations whose head is input-stream. The procedure consumes a reader as its argument and returns a fresh input stream with the same chain of translations as input-stream. This also disconnects the input stream from the reader and puts it into the truncated state; all other input streams based on the input stream at the end of the translation chain (directly or indirectly) are also put into the truncated state.

(close-input-stream input-stream)

This closes the underlying reader if input-stream is still open, and marks the input stream as closed. Applying close-input-stream to a closed stream has no effect. Closing an input stream also closes all input streams that are translations (directly or indirectly) of the input stream at the end of its own translation chain. The return values are unspecified.

(open-file-input-stream filename)

This opens a reader connected to the file named by filename and returns an input stream connected to it.

(open-blob-input-stream blob)

This opens a blob reader connected to blob and returns an input stream connected to it.

(open-string-input-stream string)

This opens a blob reader connected to the UTF-8 encoding of string and returns an input stream connected to it.

(open-reader-input-stream reader)

(open-reader-input-stream reader blob)

This returns an input stream connected to the reader reader.

If blob is present, the stream will use it as its initial buffer contents. Subsequently writing to blob directly has unspecified consequences.

(call-with-input-stream input-stream proc)

This calls proc with input-stream as an argument. If proc returns, then the stream is closed automatically and the values returned by proc are returned. If proc does not return, then the stream will not be closed automatically, unless it is possible to provide that the stream will never again be used for a read operation.

(make-translated-input-stream input-stream translate-proc)

(make-translated-input-stream input-stream translate-proc blob)

This returns a translated input stream based on input-stream. Translate-proc must be a procedure that adheres to the following specification:

(translate-proc input-stream wish): Input-stream is the underlying input stream originally passed to make-translated-input-stream. Wish is either #f, #t, or an exact, non-negative integer, giving a hint how much data is requested. #f means a chunk of arbitrary size, suggesting that the user program called input-blob-some, #t means as much as possible, suggesting that the user program called user-input-all, and an integer specifies the requested number of octets. Note that translate-proc can ignore wish. The procedure must return two values, a blob, and another input stream, analogous to the various input-... procedures. #f denotes an end of file. The returned input stream points just past the data returned.

If blob is present, the stream will use it as its initial (translated) buffer contents. Subsequently writing to blob directly has unspecified consequences.

(standard-input-stream)

Return a freshly created stream connected to the standard input reader. Note that a program should not keep the returned stream live, as standard input read in other parts of the program accumulate buffer space.

Output streams

Output streams, like input streams, come in two flavors: either directly based on a writer, or based on another output stream via translation.

Output streams are in one of three states: active, terminated, or closed. When initially created, a stream is active. A program can retrieve the writer underlying an output stream---this automatically incurs disconnecting the stream from the writer, and puts the stream into the terminated state. When explicitly closed, the writer underlying an an output stream enters the closed state. The closed state implies the terminated state.

It is an error to perform an output operations on a terminated stream.

(output-stream? obj)

This returns #t if the argument is an output stream, #f otherwise.

(output-blob output-stream blob start count)

(output-blob output-stream blob start)

(output-blob output-stream blob)

Start and count must be non-negative exact integers that default to 0 and (- (blob-length blob) start), respectively.This writes the count octets in blob blob starting at index start to the output stream. It is an error if the blob actually has size less than start + count. The return values are unspecified.

(output-u8 output-stream octet)

This writes the octet octet (which must be an exact integer in the range [0,255]) to the stream. The return values are unspecified.

(output-char output-stream char)

This writes the UTF-8 encoding of the char char to the stream. The return values are unspecified.

(output-string output-stream string start count)

(output-string output-stream string start)

(output-string output-stream string)

Start and count must be non-negative exact integers that default to 0 and (- (string-length blob) start), respectively. This writes the UTF-8 encoding of the substring (substring string (+ start count)) to the stream. The return values are unspecified.

(flush-output-stream output-stream)

This flushes any output from the buffer of output-stream to the underlying writer. It is a no-op if output-stream is terminated. The return values are unspecified.

(output-stream-position output-stream)

For writer-based output streams, this returns the writer position corresponding to the next octet read from the output stream. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a terminated or closed stream, or to a translated stream.

(set-output-stream-position! output-stream pos)

Pos must be a non-negative exact integer. This flushes the output stream and sets the current position of underlying writer to pos. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a terminated or closed stream, or to a translated stream. The return values are unspecified.

(output-stream-underliers output-stream)

Output-stream must be an open output stream. First, this flushes output-stream. This returns three values: If output-stream is a translated stream, the first value is the underlying stream, the second value the translation procedure, and the third value the translation state of the stream. If it is directly based on a writer, the first return value is the writer; the second and the third are #f.

(output-stream-writer+constructor output-stream)

Output-stream must be an open output stream. First, this flushes output-stream. This returns two values: a writer and a procedure of one argument. The writer is the underlying writer of the stream at the end of the chain of translations whose head is output-stream. The procedure consumes a writer as its argument and returns a fresh output stream with the same chain of translations as output-stream, where each translation is in the same state as in the chain. This also disconnects the output stream from the writer and puts it into the terminated state; all other output streams based on the output stream at the end of the translation chain (directly or indirectly) are also put into the truncated state.

(close-output-stream output-stream)

This closes the underlying writer if output-stream is still open, and marks the output stream as closed. Applying close-output-stream to a closed stream has no effect. Closing an output stream also closes all output streams that are translations (directly or indirectly) of the output stream at the end of its own translation chain. The return values are unspecified.

(output-stream-buffer-mode output-stream)

This returns the buffer-mode object of output-stream.

(set-output-stream-buffer-mode! output-stream buffer-mode)

If the current buffer mode of output-stream is something other than none and buffer-mode is the none buffer-mode object, this will first flush the output stream. Then, it sets the buffer-mode object associated with output-stream to buffer-mode. The return values are unspecified.

(open-file-output-stream filename file-options)

This opens a writer connected to the file named by filename via open-file-writer (passing it file-options) and returns an output stream with unspecified buffering mode connected to it.

(open-writer-output-stream writer buffer-mode)

This returns an output stream connected to the writer writer with buffering according to buffer-mode.

(call-with-blob-output-stream proc)

Proc is a procedure accepting one argument. This creates an unbuffered output stream connected to a blob writer, and calls proc with that output stream as an argument. The call to call-with-blob-output-stream returns the blob associated with the stream when proc returns.

(call-with-string-output-stream proc)

Proc is a procedure accepting one argument. This creates an unbuffered output stream connected to a blob writer, and calls proc with that output stream as an argument. The call to call-with-string-output-stream returns the UTF-8 decoding of the blob associated with the stream when proc returns.

(call-with-output-stream output-stream proc)

This calls proc with output-stream as an argument. If proc returns, then the stream is closed automatically and the values returned by proc are returned. If proc does not return, then the stream will not be closed automatically, unless it is possible to provide that the stream will never again be used for a write operation.

(make-translated-output-stream output-stream translate-proc state)

This returns a translated output stream based on output-stream. The translation can thread an arbitrary state from one output operation to the next; the initial state is given by state. Translate-proc must be a procedure that adheres to the following specification:

(translate-proc output-stream state data start count)

This is expected to write the output data in data to output-stream, which is the output stream passed into make-translated-output-stream. State is the translation state associated with the output stream.Data is the data to be written: it is either #f, a blob or an octet represented an an exact integer. If data is a blob, start is an exact integer representing the starting index of the data to be written within data. Count is the number of data octets within data to be written. (Otherwise, the values of start and count are unspecified.

If data is #f, this means that the stream is being flushed, and the translation procedure should write out any remaining data encoded in state to the output-stream, and possiblye synchronize the protocol.

The procedure must return a new state object, which will be passed to the next call to translate-proc. It is recommended that translate-proc not modify state itself, but rather generate a new state object if necessary. Otherwise, the constructor procedure by output-stream-writer+constructor may not operate correctly.

(standard-output-stream)

(standard-error-stream)

This returns output streams on the standard output writer and standard error writer, respectively.

Opening files for reading and writing

(open-file-input+output-streams filename file-options): This opens a reader and a writer connected to the file named by filename via open-file-reader+writer (passing it file-options) and returns an input stream and an output stream with unspecified buffering mode connected to them.

Text Transcoding

This part of the SRFI provides pre-packaged functionality for encoding and decoding text in some common encodings. A transcoder is an opaque object encapsulating a specific text encoding. This SRFI specifies how to obtain a transcoder given a text encoder/decoder (or codec for short) and a specified newline encoding. Codecs are constructed by pairing up input and output stream translators.

(transcoder (codec codec) (eol-style eol-style)) (syntax)

This constructs a transcoder object from a specified codec and a specified end-of-line style. The codec and the eol-style clauses are both optional. If present, codec and eol-style, must be expressions that evaluate to a codec and an eol-style object, respectively. If not present, the codec defaults to "no codec" (corresponding to UTF-8), and the eol-style object defaults to the platform's standard EOL convention.

Any operands to a transcoder form that do not match the above syntax may be platform-specific extensions. The implementation is free to ignore them, but must not signal an error.

(update-transcoder old (codec codec) (eol-style eol-style)) (syntax)

This form returns a new transcoder object constructed from an old one, with the codec and eol-style fields replaced by the specified values. (Again, the codec and the eol-style clauses are both optional. Also, unrecognized operands can be ignored, but cannot signal an error.)

(transcode-input-stream input-stream transcoder)

This creates a transcoded input stream from input-stream, assuming input-stream has the encoding specified by transcoder. It will translate the data from input-stream into UTF-8 with end-of-line encoded by U+000A.

(transcode-output-stream output-stream transcoder)

This creates a transcoded output stream from output-stream, translating the data fed into output-stream into the encoding specified by transcoder, assuming it is encoded as UTF-8 with end-of-line encoded by U+000A.

(eol-style lf) (syntax)

(eol-style crlf) (syntax)

(eol-style cr) (syntax)

These forms evaluate to end-of-line-style objects - lf stands for using U+000A, crlf stands for using U+000D U+000A, and cr stands for using U+000D as end-of-line.

(make-codec string translate-input translate-output initial-state)

This constructs a codec object. String must be a string naming the encoding. Translate-input must be a translation procedure suitable for use by make-translated-input-stream. Translate-output must be a translation procedure suitable for use by make-translated-output-stream, and initial-state must be a suitable initial state.

latin-1-codec

utf-16le-codec

utf-16be-codec

utf-32le-codec

utf-32be-codec

These are predefined codecs for the ISO8859-1, UTF-16LE, UTF-16BE, UTF32-LE, and UTF-32BE encodings.

Imperative I/O

The Imperative I/O layer provides buffered I/O based on ports. Ports, like streams, allow buffered I/O on the underlying data sources and destinations. The output port abstractions are very similar to the output stream abstractions. However, unlike input streams, input ports are imperative; a read operation destructively removes data from the port. The port layer is very similar, but not identical, to the R5RS I/O system.

It is possible to construct ports from streams. Such a stream port is just a reference cell to a stream. The various procedures constructing ports described in this section are allowed but not required to return stream ports, however; the following section describes the abstractions specific to stream ports.

The Imperative I/O layer introduces one condition type of its own.

(define-condition-type &i/o-port-error &i/o-error i/o-port-error? (port i/o-error-port)): This condition type allows specifying with what particular port an I/O error is associated. The port field has purely informational purpose. Conditions raised in by Imperative I/O procedures may include an &i/o-port-error condition, but are not required to do so.

Input ports

(input-port? obj)

This returns #t if the argument is an input port, #f otherwise.

(read-blob-some input-port)

If any data is available in input-port before the next end of file, this returns a freshly allocated blob of non-zero size containing that data, and updates input-port to point exactly past the data read. If an end of file has been reached, it returns #f, and the input stream is updated to point just past the end of file.

For a stream input port, this calls input-blob-some on the underlying input stream, updates the underlying input stream to the second return value, and returns input-blob-some's first return value.

(read-u8 input-port)

If a octet is available before the next end of file, this returns that octet as an exact integer, and updates input-port to point exactly past the octet read. If an end of file has been reached, it returns #f, and the input stream is updated to point just past the end of file.

For a stream input port, this calls input-u8 on the underlying input stream, updates the underlying input stream to the second return value, and returns input-u8's first return value.

(read-blob-n input-port n)

N must be an exact, non-negative integer, specifying the number of octets to be read.This tries to read n octets. If n or more octets are available before the next end of file, it returns a blob of size n. If fewer octets are available before the next end of file, it returns a blob containing those octets. Subsequently, the input stream is updated to point exactly past the data read. If end of file has been reached, this returns #f, and the input stream is updated to point just past the end of file.

For a stream input port, this calls input-blob-n on the underlying input stream, updates the underlying input stream to the second return value, and returns input-blob-n's first return value.

(read-blob-n! input-port blob start count)

Count must be an exact, non-negative integer, specifying the number of octets to be read. Blob must be a blob with at least (+ start count) elements. This tries to read count octets. If count or more octets are available before the next end of file, they are written into blob starting at index start, and it returns count. If fewer octets are available before the next end of file, it writes the available octets into blob starting at index start, and it returns the number of octets actually read. In either case, the input port is updated to point exactly past the data read. If end of file has been reached, this returns #f, and it updated the input port to points just past the end of file. This procedure will block until either data is available or end of file is reached.

For a stream input port, this calls input-blob-n! on the underlying input stream, updates the underlying input stream to the second return value, and returns input-blob-n!'s first return value.

(read-blob-all input-port)

If data is available before the next end of file, this returns a blob containing all octets until that end of file. If not, read-blob-all returns #f. The input stream is updated to point just past the end of file. Note that this function may block indefinitely on ports connected to interactive devices, even though data is available.

For a stream input port, this calls input-blob-all on the underlying input stream, updates the underlying input stream to the second return value, and returns input-blob-all's first return value.

(read-string input-port)

If any data representing a string is available before the next end of file, this returns a string of non-zero size containing the UTF-8 decoding of that data. The input port is updated to point just past the data read. If an end of file has been reached, it returns #f, and the input port is updated to point just past the end of file. This procedure will block until either data is available or end of file is reached.

For a stream input port, this calls input-string on the underlying input stream, updates the underlying input stream to the second return value, and returns input-string's first return value.

(read-char input-port)

If a char is available before the next end of file, this returns that char, and the input port is updated to point past the data read. If an end of file has been reached, this returns #f, and the input code returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

For a stream input port, this calls input-char on the underlying input stream, updates the underlying input stream to the second return value, and returns input-char's first return value.

(read-string-n input-port n)

N must be an exact, non-negative integer, specifying the number of chars to be read.It tries to read n chars. If n or more chars are available before the next end of file, it returns a string of size nconsisting of those chars. If fewer chars are available before the next end of file, it returns a string containing those chars. In either case, the input port is updated to point exactly past the data read. If end of file has been reached, it returns #f, and the input port is updated to point just past the end of file. This procedure will block until either data is available or end of file is reached.

For a stream input port, this calls input-string-n on the underlying input stream, updates the underlying input stream to the second return value, and returns input-string-n's first return value.

(read-string-n! input-port string start count)

For a stream input port, this calls input-string-n! on the underlying input stream, updates the underlying input stream to the second return value, and returns input-string-n!'s first return value.

(read-string-all input-port)

If data is available before the next end of file, the value returned is a string contains all text until the next end of file. If no data is available, the value is #f. The input port is updated to point just past the end of file. Note that this function may block indefinitely on streams connected to interactive readers, even though data is available.

For a stream input port, this calls input-string-all on the underlying input stream, updates the underlying input stream to the second return value, and returns input-string-all's first return value.

(peek-u8 input-port)

This is the same as read-u8, but does not advance the port.

For a stream input port, this calls input-u8 on the underlying input stream, but does not update the underlying input stream. It returns input-u8's first return value.

(peek-char input-port)

This is the same as read-char, but does not advance the port.

For a stream input port, this calls input-char on the underlying input stream, but does not update the underlying input stream. It returns input-char's first return value.

(port-eof? input-port)

Returns #t if the port is currently pointing at an end-of-file, #f otherwise.

For a stream input port, this returns the result of calling stream-eof? on the input stream underlying input-port.

(input-port-position input-port)

This returns the octet position corresponding to the next octet read from the input stream. This procedure raises an &i/o-operation-not-available-error condition if the port does not support the operation. It is an error to apply this procedure to a closed port, a transcoded port, or a stream input port on a truncated stream, or a translated stream.

For a stream input port, this calls input-stream-position on the underlying input stream and returns the result.

(set-input-port-position! input-port pos)

Pos must be a non-negative exact integer. This sets the current octet position of input-portto pos. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a closed port, a transcoded port, or a stream output port on a terminated stream, or a translated stream.

For a stream input port, this extracts the underlying reader, sets its position, and sets the reader of the port to a new input stream constructed from the same reader.

(close-input-port input-port)

This closes input-port, rendering the port incapable of accepting data. This has no effect if the port has already been closed. The return values are unspecified.

For a stream input port, this calls close-input-stream on the stream underlying input-port.

(open-file-input-port filename)

(open-file-input-port filename transcoder)

This returns an input port for the named file. The input port may or may not be a stream port. If a transcoder transcoder is specified, the port is appropriately transcoded.

(open-blob-input-port blob)

(open-blob-input-port blob transcoder)

This returns an input port, associated with a blob stream on the blob blob. The input port may or may not be a stream port. If a transcoder transcoder is specified, the port is appropriately transcoded.

(open-string-input-port string)

(open-string-input-port string transcoder)

This returns an input port, associated with a blob stream on the UTF-8 encoding of string string. The input port may or may not be a stream port. If a transcoder transcoder is specified, the port is appropriately transcoded.

(call-with-input-port input-port proc)

This calls proc with input-port as an argument. If proc returns, then the port is closed automatically and the values returned by proc are returned. If proc does not return, then the port will not be closed automatically, unless it is possible to provide that the port will never again be used for a read operation.

(standard-input-port)

Returns an input port connected to standard input, possibly a fresh one on each call. Note that a program should not keep the returned port live for too long without reading from it, as it may be a stream port connected to a standard-input stream, and standard input read in other parts of the program may accumulate buffer space.

Output ports

(output-port? obj)

This returns #t if the argument is an output port, #f otherwise.

(write-blob output-port blob)

(write-blob output-port blob start)

(write-blob output-port blob start count)

Start and count must be non-negative exact integers that default to 0 and (- (blob-length blob) start), respectively.This writes the count octets in blob blob starting at index start to the output port. It is an error if the blob actually has size less than start + count. The return values are unspecified.

For a stream output port, this calls output-blob on the underlying output stream and blob, start and count.

(write-u8 output-port octet)

This writes the octet octet to the output port. The return values are unspecified.

For a stream output port, this calls output-u8 on the underlying output stream and u8. The return values are unspecified.

(write-string-n output-port string)

(write-string-n output-port string start)

(write-string-n output-port string start count)

Start and count must be non-negative exact integers. Start defaults to 0. Count defaults to (- (string-length blob) start). This writes the UTF-8 encoding of the substring (substring string (+ start count)) to the port. The return values are unspecified.

For a stream output port, this calls output-string on the underlying output stream and string, start and count.

(write-char output-port char)

This writes the UTF-8 encoding of the char char to the port. The return values are unspecified.

For a stream output port, this calls output-char on the underlying output stream and char.

(newline output-port)

This is equivalent to (write-char #\newline output-port). The return values are unspecified.

(flush-output-port output-port)

This flushes any output from the buffer of output-stream to the underlying data or device. The return values are unspecified.

For a stream output port, this calls flush-output-stream on the underlying output stream.

(output-port-buffer-mode output-port)

This returns the buffer-mode object of output-port.

For a stream output port, this calls output-stream-buffer-mode on the underlying output stream.

(set-output-port-buffer-mode! output-port buffer-mode)

If the current buffer mode of output-port is something other than none and buffer-mode is the none buffer-mode object, this will first flush the output port. Then, it sets the buffer-mode object associated with output-port to buffer-mode. The return values are unspecified.

For a stream output port, this calls set-output-stream-buffer-mode! on the underlying output stream.

(output-port-position output-port)

This returns the position corresponding to the next octet read from the output stream. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a closed port, a transcoded port, or a stream output port on a terminated stream, or a translated stream.

For a stream output port, this calls output-stream-position on the underlying output stream and returns the result.

(set-output-port-position! output-port pos)

Pos must be a non-negative exact integer. This flushes the output port and sets its current octet position to pos. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a closed port, a transcoded port, or a stream output port on a terminated stream, or a translated stream.

For a stream output port, this calls set-output-stream-position! on the underlying output stream with pos and returns whatever it returns.

(close-output-port output-port)

This closes output-port, rendering the port incapable of delivering data. This has no effect if the port has already been closed. The return values are unspecified.

For a stream output port, this calls close-output-stream on the stream underlying output-port.

(open-file-output-port filename file-options)

(open-file-output-port filename file-options transcoder)

This returns an output port for the named file and the specified options. The output port may or may not be a stream port. If a transcoder transcoder is specified, the port is appropriately transcoded.

(call-with-output-blob proc)

(call-with-output-blob proc transcoder)

Proc is a procedure accepting one argument. This creates an unbuffered output port connected to a blob writer, and calls proc with that output port as an argument. The output port may or may not be a stream port. The call to call-with-blob-output-port returns the blob associated with the port when proc returns. If a transcoder transcoder is specified, the port is appropriately transcoded.

(call-with-output-string proc)

(call-with-output-string proc transcoder)

Proc is a procedure accepting one argument. This creates an unbuffered output connected to a blob writer, and calls proc with that port as an argument. The output port may or may not be a stream port. The call to call-with-string-output-stream returns the UTF-8 decoding of the blob associated with the port when proc returns. If a transcoder transcoder is specified, the port is appropriately transcoded.

(call-with-output-port output-port proc)

This calls proc with output-port as an argument. If proc returns, then the port is closed automatically and the values returned by proc are returned. If proc does not return, then the port will not be closed automatically, unless it is possible to provide that the port will never again be used for a write operation.

(standard-output-port)

(standard-error-port)

Returns a port connected to the standard output or standard error, respectively.

Opening files for reading and writing

(open-file-input+output-ports filename file-options)
(open-file-input+output-ports filename file-options transcoder): This returns an input port and an output port for the named file and the specified options. The ports may or may not be stream ports. If a transcoder transcoder is specified, the ports are appropriately transcoded.

Stream ports

These are the procedures that are specific to stream ports.

Stream input ports

(make-stream-input-port input-stream): This creates a stream input port that points to input-stream.
(stream-input-port? obj): This returns #t if the argument is a stream input port, #f otherwise.
(input-port-stream stream-input-port): This returns the input stream underlying stream-input-port.
(set-input-port-stream! stream-input-port input-stream): This sets the input stream underlying stream-input-port to input-stream.

Stream output ports

(make-stream-output-port output-stream): This creates a stream output port that points to output-stream.
(stream-output-port? obj): This returns #t if the argument is a stream output port, #f otherwise.
(output-port-stream stream-output-port): This returns the output stream underlying stream-output-port.
(set-output-port-stream! stream-output-port output-stream): This sets the output stream underlying stream-output-port to output-stream.

Ports from readers and writers

(open-reader-input-port reader)
(open-reader-input-port reader transcoder): This returns an input port connected to the reader reader.If a transcoder transcoder is specified, the port is appropriately transcoded.
(open-writer-output-port writer buffer-mode)
(open-writer-output-port writer buffer-mode transcoder): This returns an output port connected to the writer writer with buffering according to buffer-mode.If a transcoder transcoder is specified, the port is appropriately transcoded.

Design rationale

Encoding

Many I/O system implementations allow associating an encoding with a port, allowing the direct use of several different encodings with ports. The problem with this approach is that the encoding/decoding defines a mapping from binary data to text or vice versa. Because of this asymmetry, such mappings do not compose. The result is usually complications and restrictions in the I/O API, such as the inability to mix text or binary data, or the inability to change encoding mid-stream.

This SRFI avoids this problem by specifying that textual I/O always uses UTF-8. This means that, if the target or source of an I/O stream is to use a different encoding, a translated stream needs to be used, for which this SRFI offers the required facilities. This means that text decoders or encoders are expressed as binary-to-binary mappings, and as such compose.

`display` vs `write`

R5RS calls the procedures for writing something to an output port write-<something>. In a previous revision of this SRFI, all were called display-<something>. R5RS doesn't offer a consistent rule for naming, as the display and write-char procedures behave identically on character arguments, wherease write and write-char do not.

Historically, it seems that the original proposal for the I/O subsystem in RnRS indeed called the procedure display-char. I do not know why it was renamed---probably for compatibility with Common Lisp, which also has write-char.

While the procedures in this SRFI follow a consistent naming scheme, consistency is an issue for what's read and writein R5RS. The naming scheme proposed here suggests they be called read-datum and write-datum.

`char-ready?`

This SRFI intentionally does not provide char-ready?, which is part of R5RS. The original intention of the procedure seems to have been to interface with something like Unix select(2). With multi-byte encodings such as UTF-8, this is no longer sufficient: the procedure would really have to look at the actual input data in order to determine whether a complete character is actually present. This makes realistic implementations of char-ready? inconsistent with the user's expectations. A procedure byte-ready? would be more consistent. On the other hand, such a procedure is rarely useful in real-world programs, hard to specify to the point where it would be portably usable, and complicates all layers of the I/O system, as readers would have to provide an additional member procedure to enable its implementation. Moreover, a select(2)-like implementation is not possible on all plattforms and all types of ports. Consequently, char-ready? and byte-ready?are not part of this SRFI.

`display`

This SRFI does not provide display, which is part of R5RS. Display is woefully underspecified, and mostly used for debug output. It seems display should be replaced by a procedure for formatted output, possibly augmented by handling of dynamically-bound "current ports".

Optional ports and argument order for imperative I/O

The argument order of the Imperative I/O layer is different from R5RS: The port is always at the beginning, and it is mandatory.For a rationale, see the message by Taylor Campbell on the subject.

No distinct end of file object

In R5RS, the distinct type of end of file objects is primarily for the benefit of read, where end of file must be denoted by an object that read cannot normally return as a result of parsing the input. However, it does not seem necessary to drag in the complications of this separate object into the other I/O operations, where #f is perfectly adequate to represent end of file.

Reference Implementation

Here is a tarball containing a reference implementation of this SRFI. It only runs on a version of Scheme 48 that has not been released at the time of writing in this SRFI.

However, its actual dependencies on Scheme 48 idiosyncracies are few. Chief are its use of the module system, which is easily replaced by another, and the implementation of Unicode. To implement primitive readers and writers on files, the code only relies on suitable library procedures to open the files, and read-byte and write-byte procedures to read or write single bytes from a (R5RS) port, as well as a force-output procedure to flush a port.

The reference implementation has not been highly tuned, but I have spent a modest amount of time making the code deal with buffers in an economic buffer. Because of this, the code is more complicated than it needs to be, but hopefully also more usable as a basis for implementing this SRFI in actual Scheme systems.

Examples

Many examples are adapted from The Standard ML Basis Library edited by Emden R. Gansner and John H. Reppy. Cambrige University Press, 2004.

The code makes liberal use of SRFIs 1 (List Library), 11 (Syntax for receiving multiple values), 26 (Notation for Specializing Parameters without Currying).

The tarball with the reference implementation contains these examples along with test cases for them.

This customized reader reads from a list of blobs. A null blob yields EOF. Procedures for defining streams based on such readers follow.

(define (open-blobs-reader bs)
  (let* ((pos 0))
                   
    (make-simple-reader
     "<octet vectors>"
     bs
     5                                  ; for debugging
     (lambda (blob start count)
       (cond
        ((null? bs)
         0)
        (else
         (let* ((b (car bs))
                (size (blob-length b))
                (real-count (min count (- size pos))))
           (blob-copy! b pos
                       blob start
                       real-count)
           (set! pos (+ pos real-count))
           (if (= pos size)
               (begin
                 (set! bs (cdr bs))
                 (set! pos 0)))
           real-count))))
     ;; pretty rough ...
     (lambda ()
       (if (null? bs)
           0
           (- (blob-length (car bs)) pos)))
     #f #f #f                           ; semantics would be unclear
     (lambda ()
       (set! bs #f)))))                 ; for GC

(define (open-strings-reader strings)
  (open-blobs-reader (map string->utf-8 strings)))

(define (open-blobs-input-stream blobs)
  (open-reader-input-stream (open-blobs-reader blobs)))

(define (open-strings-input-stream strings)
  (open-reader-input-stream (open-blobs-reader (map string->utf-8 strings))))

Create a string via a string output port:

(define three-lines-string
  (call-with-output-string
   (lambda (port)
     (write-string port "foo") (newline port)
     (write-string port "bar") (newline port)
     (write-string port "baz") (newline port))))

Note that, for input streams, the successive streams need to be threaded through the program:

(define (input-two-lines s)
  (let*-values (((line-1 s-2) (input-line s))
                ((line-2 _)   (input-line s-2)))
    (values line-1 line-2)))

There may be life after end of file; hence, the following is not guaranteed to return true:

(define (at-end?/broken s)
  (let ((z (stream-eof? s)))
    (let-values (((a s-2) (input-blob-some s)))
      (let ((x (stream-eof? s-2)))
        (equal? z x)))))

... but this is:

(define (at-end? s)
  (let ((z (stream-eof? s)))
    (let-values (((a s-2) (input-blob-some s)))
      (let ((x (stream-eof? s)))
        (equal? z x)))))

Catch an I/O exception:

(define (open-it filename)
  (guard
   (condition
    ((i/o-error? condition)
     (if (message-condition? condition)
         (begin
           (write-string (standard-error-port)
                         (condition-message condition))
           (newline (standard-error-port))))
     #f))
   (open-file-input-stream filename)))

Read a file directly:

(define (get-contents filename)
  (call-with-input-port (open-file-input-port filename)
    read-blob-all))

Read a file octet-by-octet:

(define (get-contents-2 filename)
  (call-with-input-port (open-file-input-port filename)
    (lambda (port)
      (let loop ((accum '()))
        (let ((thing (read-u8 port)))
          (if (not thing)
              (list->blob (reverse accum))
              (loop (cons thing accum))))))))

(define (list->blob l)
  (let ((blob (make-blob (length l))))
    (let loop ((i 0) (l l))
      (if (null? l)
          blob
          (begin
            (blob-u8-set! blob i (car l))
            (loop (+ 1 i) (cdr l)))))))

Read file chunk-by-chunk:

(define (get-contents-3 filename)
  (call-with-input-port (open-file-input-port filename)
    (lambda (port)
      (let loop ((accum '()))
        (cond
         ((read-blob-some port)
          => (lambda (blob)
               (loop (cons blob accum))))
         (else
          (concatenate-blobs (reverse accum))))))))

(define (concatenate-blobs list)
  (let* ((size (fold + 0 (map blob-length list)))
         (result (make-blob size)))
    (let loop ((index 0)
               (blobs list))
      (if (null? blobs)
          result
          (let* ((b (car blobs))
                 (size (blob-length b)))
            (blob-copy! b 0 result index size)
            (loop (+ index size)
                  (cdr blobs)))))))

Read a file using Stream I/O:

(define (get-contents/stream filename)
  (call-with-input-stream (open-file-input-stream filename)
    (lambda (stream)
      (let-values (((blob _) (input-blob-all stream)))
        blob))))

Read a file octet by octet:

(define (get-contents/stream-2 filename)
  (call-with-input-stream (open-file-input-stream filename)
    (lambda (stream)
      (let loop ((accum '()) (stream stream))
        (let-values (((octet stream) (input-u8 stream)))
          (if (not octet)
              (list->blob (reverse accum))
              (loop (cons octet accum) stream)))))))

Read a file chunk-by-chunk:

(define (get-contents/stream-3 filename)
  (call-with-input-stream (open-file-input-stream filename)
    (lambda (stream)
      (let loop ((accum '()) (stream stream))
        (let-values (((chunk stream) (input-blob-some stream)))
          (if chunk
              (loop (cons chunk accum) stream)
              (concatenate-blobs (reverse accum))))))))

Drop a word at the beginning of a stream selectively:

(define (eat-thousand stream)
  (let-values (((text new-stream)
                (input-string-n stream (string-length "thousand"))))
    (if (string=? text "thousand")
        new-stream
        stream)))

Skip whitespace at the beginning of a stream:

(define (skip-whitespace stream)
  (let-values (((thing new-stream)
                (input-char stream)))
    (cond
     ((not thing) stream)
     ((char-whitespace? thing)
      (skip-whitespace new-stream))
     (else stream))))

Reading a line could be implemented by scanning forward, then reading a chunk from the original position:

(define (my-input-line stream)
  (let count ((n 0) (g stream))
    (let-values (((thing g*) (input-char g)))
      (cond
       ((not thing)
        (if (zero? n)
            (values #f g*)
            (input-string-n stream n)))
       ((char=? #\newline thing)
        (let*-values (((line _) (input-string-n stream n)))
          (values line g*)))
       (else
        (count (+ 1 n) g*))))))

Write some text to a file:

(define (hello myfile)
  (call-with-output-stream (open-file-output-stream myfile (file-options truncate create))
    (lambda (stream)
      (output-string stream "Hello, ")
      (output-string stream "world!")
      (output-char stream #\newline))))

Extract the reader from a stream, read a octet from it, and then reconstruct a stream from it:

(define (after-first filename)
  (let ((stream (open-file-input-stream filename)))
    (call-with-values
        (lambda () (input-stream-reader+constructor stream))
      (lambda (reader construct)
        (let ((b (make-blob 1)))
          (reader-read! reader b 0 1)
          (call-with-input-stream (construct reader)
            (lambda (stream-2)
              (let-values (((contents _) (input-string-all stream-2)))
                contents))))))))

Extract the reader from a stream, set position, and then reconstruct a stream from it:

(define (after-n stream n)
  (call-with-values
      (lambda () (input-stream-reader+constructor stream))
    (lambda (reader construct)
      (reader-set-position! reader n)
       (call-with-input-stream (construct reader)
         (lambda (stream-2)
           (let-values (((contents _) (input-string-all stream-2)))
             contents))))))

Translate CR/LF to LF on input:

(define (translate-crlf-input original-input-stream wish)

  ;; state automaton

  (define (vanilla input-stream count)
    (call-with-values
        (lambda ()
          (input-u8 input-stream))
      (lambda (octet input-stream)
        (cond
         ((not octet) (finish count))
         ((= 13 octet) (cr input-stream count))
         (else (vanilla input-stream (+ 1 count)))))))
            
  (define (cr input-stream count)
    (call-with-values
        (lambda ()
          (input-u8 input-stream))
      (lambda (octet input-stream)
        (cond
         ((not octet) (finish (+ 1 count)))     ; CR hasn't been counted yet
         ((= 10 octet)
          (call-with-values
              (lambda ()
                (input-blob-n original-input-stream (+ 1 count)))
            (lambda (blob _)
              (blob-u8-set! blob count 10)
              (values blob input-stream))))
         (else (vanilla input-stream (+ count 1)))))))

  (define (finish count)
    (if (zero? count)
        (let-values (((_ past-eof) (input-u8 original-input-stream)))
          (values #f past-eof))
        (call-with-values
            (lambda ()
              (input-blob-n original-input-stream count))
          (lambda (blob input-stream)
            (values blob input-stream)))))
          
  (vanilla original-input-stream 0))

(define (make-crlf-translated-input-stream input-stream)
  (make-translated-input-stream input-stream
                                translate-crlf-input))

Translate LF to CR/LF on output:

(define (translate-crlf-output output-stream state data start count)
  (cond
   ((not data))
   ((blob? data)
    (let ((end (+ start count)))
      (let loop ((index start))
        (cond
         ((blob-index data 10 index end)
          => (lambda (lf-index)
               (output-blob output-stream data index (- lf-index index))
               (output-u8 output-stream 13)
               (output-u8 output-stream 10)
               (loop (+ 1 lf-index))))
         (else
          (output-blob output-stream data index (- end index)))))))
   ((= data 10)
    (output-u8 output-stream 13)
    (output-u8 output-stream 10))
   (else
    (output-u8 output-u8 data)))
  (unspecific))

(define (blob-index blob octet start end)
  (let loop ((index start))
    (cond
     ((>= index end)
      #f)
     ((= octet (blob-u8-ref blob index))
      index)
     (else
      (loop (+ 1 index))))))

Algorithmic reader producing an infinite stream of blanks:

(define (make-infinite-blanks-reader)
  (make-simple-reader "<blanks, blanks, and more blanks>"
                      #f
                      4096
                      (lambda (blob start count)
                        (let loop ((index 0))
                          (if (>= index count)
                              index
                              (begin
                                (blob-u8-set! blob (+ start index) 32)
                                (loop (+ 1 index))))))
                      (lambda ()
                        1000) ; some number
                      #f #f #f
                      unspecific))

Transcoder round trip:

(define (transcoder-round-trip transcoder text)
  (let* ((coded
          (call-with-blob-output-stream
           (lambda (output-stream)
             (let ((output-stream
                    (transcode-output-stream output-stream transcoder)))
               (output-string output-stream text)))))

         (input-stream (open-blob-input-stream coded))
         (input-stream (transcode-input-stream input-stream transcoder)))
    (let-values (((text _) (input-string-all input-stream)))
      text)))

Decoding UTF-32LE via transcoders:

(define (decode-utf-32le blob)
  (let* ((input-stream (open-blob-input-stream blob))
         (input-stream (transcode-input-stream input-stream
                                               (transcoder (codec utf-32le-codec)))))
    (let-values (((text _) (input-string-all input-stream)))
      text)))

Acknowledgements

Sebastian Egner provided valuable comments on a draft of this SRFI.

References

The Standard ML Basis Library edited by Emden R. Gansner and John H. Reppy. Cambrige University Press, 2004.
The Unicode Home Page

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Francisco Solsona

Title

Authors

Status

Abstract

Rationale

Specification

Prerequisites

Unicode support

Filenames

General remarks

Organization

Data extensions

Condition types

Buffer modes

File Options

Primitive I/O

I/O buffers

Readers

Writers

Opening files for reading and writing

Stream I/O

Input streams

Output streams

Opening files for reading and writing

Text Transcoding

Imperative I/O

Input ports

Output ports

Opening files for reading and writing

Stream ports

Stream input ports

Stream output ports

Ports from readers and writers

Design rationale

Encoding

display vs write

char-ready?

display

Optional ports and argument order for imperative I/O

No distinct end of file object

Reference Implementation

Examples

Acknowledgements

References

Copyright

`display` vs `write`

`char-ready?`

`display`