Title

Stream I/O

Author

Michael Sperber

Status

This SRFI is currently in withdrawn status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-80@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2005-10-08
Draft: 2005-11-24
Withdrawn: 2006-11-20

Abstract

This SRFI defines an I/O layer for lazy, mostly functional buffered streams.

The layer architecture is similar to the upper three layers of the I/O subsystem in The Standard ML Basis Library.

In particular, this layer provides

buffered reading and writing
arbitrary lookahead
dynamic redirection of input or output
binary and text I/O, mixed if needed
translated data streams
the ability to create I/O streams from arbitrary readers and writers

It builds on the Primitive I/O layer specified in SRFI 79 (Primitive I/O).

Rationale

The I/O subsystem in R5RS is too limited in many respects: It only provides for text I/O, it only allows reading at the character and the datum level, and some of the primitives are mis-designed or underspecified. As a result, almost every Scheme implementation has its own extensions to the I/O system, and rarely are two of these extensions compatible.

This SRFI is meant as one possible compelling replacement for the R5RS I/O subsystem. As such, it is a completely new design, and it is not based on the extensions a particular existing Scheme system provides. (In fact, it is probably, in its entirety, unlike what any existing Scheme system provides.) Moreover, it is meant to be a substrate for further extensions which can be built on top of the subsystem via the interface described here.

The Port I/O layer specified in SRFI 81 (Port I/O) provides an alternative such replacement. The Port I/O layer is closer to R5RS. However, the Stream I/O layer is more powerful and more expressive in several ways.

The design of this SRFI is driven by the requirements mentioned in the abstract on the one hand, and on the excellent design of the I/O subsystem in the Standard ML Basis Library. The latter is also the reason why this SRFI is different from the extensions provided by any existing Scheme implementation, as none of them picked up on the Basis design, and because the Basis design seems superior to the extensions I have looked at. (Among those I have looked at are Scheme 48, scsh, Gambit-C, and PLT Scheme.)

Note, however, that this SRFI differs from the SML Basis in several important respects, among them the handling of textual I/O streams, the ability to define translated streams, and the absence of any functionality related to non-blocking I/O. The latter is more properly in the domain of a thread/event system; Concurrent ML shows that the SML Basis plays well with such a system, and I expect the same to hold true here. The text encoding/translation functionality is different mainly because it plays to a different substrate for representing text (based on Unicode; see below) than Standard ML.

Like the Standard ML Basis I/O subsystem, the I/O system specified in this SRFI is not suitable for maximal-throughput I/O, chiefly because it does not re-use buffers. However, I deemed the achievable performance as more than adequate for most applications---it seemed a small price to pay for the convenient programming model.

Specification

Prerequisites

This SRFI is meant for Scheme implementations with the following prerequisites:

Unicode support

This SRFI assumes that the char datatype in Scheme corresponds to Unicode scalar values. This, in turn, means that strings are represented as vectors of scalar values. (Note that this is consistent with SRFI 14 (Character-set library) and SRFI 75 (R6RS Unicode data).) It may be possible to make this SRFI work in an ASCII- or Latin-1-only system, but I have not made any special provisions to ensure this.

Filenames

Filenames in this SRFI are the same as in SRFI 79 (Primitive I/O).

General remarks

For procedures that have no "natural" return value, this SRFI often uses the sentence

The return values are unspecified.

This means that number of return values and the return values are unspecified. However, the number of return values is such that it is accepted by a continuation created by begin. Specifically, on Scheme implementations where continuations created by begin accept an arbitrary number of arguments (this includes most implementations), it is suggested that the procedure return zero return values.

Blobs

The specification frequently refers to blobs. These are as specified in SRFI 74 (Octet-Addressed Binary Blocks).

File options

File options are as in SRFI 79 (Primitive I/O).

Buffer modes

Each output stream has an associated buffer mode that defines when an output operation will flush the buffer associated with the output stream. The possible buffer modes are none for no buffering, line for flushing upon newlines, and block for block-based buffering.

While this SRFI does not require buffer modes to form a distinct type, implementors are encouraged to make them a distinct type.

(buffer-mode name) (syntax): Name must be one of the identifiers none, line, and block. This returns a buffer-mode object denoting the associated buffer mode. There is only one such object for each mode, so a program can compare them using eq?.
(buffer-mode? obj): This returns #t if the argument is a buffer-mode object, #f otherwise.

Text Transcoders

Transcoders provide pre-packaged functionality for encoding and decoding text in some common encodings. A transcoder is an opaque object encapsulating a specific text encoding. This SRFI specifies how to obtain a transcoder given a text encoder/decoder (or codec for short) and a specified newline encoding. Codecs are constructed by pairing up input and output stream translators.

(transcoder (codec codec) (eol-style eol-style)) (syntax)

This constructs a transcoder object from a specified codec and a specified end-of-line style. The codec and the eol-style clauses are both optional. If present, codec and eol-style, must be expressions that evaluate to a codec and an eol-style object, respectively. If not present, the codec defaults to "no codec" (corresponding to UTF-8), and the eol-style object defaults to the platform's standard EOL convention.

Any operands to a transcoder form that do not match the above syntax may be platform-specific extensions. The implementation is free to ignore them, but must not signal an error.

(update-transcoder old (codec codec) (eol-style eol-style)) (syntax)

This form returns a new transcoder object constructed from an old one, with the codec and eol-style fields replaced by the specified values. (Again, the codec and the eol-style clauses are both optional. Also, unrecognized operands can be ignored, but cannot signal an error.)

(eol-style lf) (syntax)

(eol-style crlf) (syntax)

(eol-style cr) (syntax)

These forms evaluate to end-of-line-style objects - lf stands for using U+000A, crlf stands for using U+000D U+000A, and cr stands for using U+000D as end-of-line.

(make-codec string translate-input translate-output initial-state)

This constructs a codec object. String must be a string naming the encoding. Translate-input must be a translation procedure suitable for use by make-translated-input-stream. Translate-output must be a translation procedure suitable for use by make-translated-output-stream, and initial-state must be a suitable initial state.

latin-1-codec

utf-16le-codec

utf-16be-codec

utf-32le-codec

utf-32be-codec

These are predefined codecs for the ISO8859-1, UTF-16LE, UTF-16BE, UTF32-LE, and UTF-32BE encodings.

Input and Output Streams

The Stream I/O layer defines high-level I/O operations on two new datatypes: input streams and output streams. These operations include binary and textual I/O. Input streams are treated in lazy functional style: input from a stream s yields an object representing the input itself, and a new input stream s1. S will continue to represent exactly the same position within the input; to advance within the stream, the program needs to perform input from s1. Consequently, input streams allow arbitrary lookahead, which is especially convenient for all kinds of scanning.

Output streams are more conventional, as the lazy functional style does not make sense with output.

Both input streams and output streams are either directly connected to underlying readers and writers, or are defined by translation on an underlying stream. This makes it possible to perform trivial transformations such as CR/LF translation, but also transparent recoding on the streams.

Textual I/O always uses UTF-8 as the underlying encoding. Other encodings can easily be supported by translating to or from UTF-8 using the translation framework. If a decoding error occurs, the implicit decoder will skip the octet starting the character encoding, yield a ? character, and attempt to continue decoding after that initial octet.

The Stream I/O layer adds an additional condition type to the condition types specified in SRFI 79 (Primitive I/O).

(define-condition-type &i/o-stream-error &i/o-error
  i/o-stream-error?
  (stream i/o-error-stream))

The stream field has purely informational purpose. Conditions raised in by Stream I/O procedures may include an &i/o-stream-error condition, but are not required to do so.

Input streams

Input streams come in two flavors: either directly based on a reader, or based on another input stream via translation. Input streams are in one of three states: active, truncated, or closed. When initially created, a stream is active. A program can retrieve the reader underlying an input stream---this automatically incurs disconnecting the stream from the reader, and puts the stream into the truncated state. When explicitly closed, the reader underlying an open input stream is closed as well. The closed state implies the truncated state.

Reading from a truncated stream is not an error; after all the existing buffers having been exhausted, the stream behaves as if an infinite sequence of end of files followed.

(input-stream? obj)

This returns #t if the argument is an input stream, #f otherwise.

(input-blob-some input-stream)

This returns two values: a value and another input stream. The input stream returned points exactly past the data read. If any data is available before the next end of file, this returns a freshly allocated blob of non-zero size containing that data. If an end of file has been reached, the value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-u8 input-stream)

This returns two values: a value and another input stream. The input stream returned points exactly past the data read. If an octet is available before the next end of file, this returns that octet as an exact integer. If an end of file has been reached, the value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-blob-n input-stream n)

N must be an exact, non-negative integer, specifying the number of octets to be read. This returns two values: a value and another input stream. It tries to read n octets. If n or more octets are available before the next end of file, it returns a blob of size n. If fewer octets are available before the next end of file, it returns a blob containing those octets. The input stream returned points exactly past the data read. If end of file has been reached, the return value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-blob-n! input-stream blob start count)

Count must be an exact, non-negative integer, specifying the number of octets to be read. Blob must be a blob with at least (+ start count) elements. This returns two values: a value and another input stream. It tries to read count octets. If count or more octets are available before the next end of file, they are written into blob starting at index start, and it returns count as the value. If fewer octets are available before the next end of file, it writes the available octets into blob starting at index start, and it returns the number of octets actually read as the value. The input stream returned points exactly past the data read. If end of file has been reached, the return value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-blob-all input-stream)

This returns two values: a value and another input stream. If data is available before the next end of file, the value is a blob containing all octets until that end of file. If not, the value is #f. The input stream returned points just past the end of file. Note that this function may block indefinitely on streams connected to interactive readers, even though data is available.

(input-string input-stream)

This returns two values: a value and another input stream. The input stream returned points exactly past the data read. If any data representing a string is available before the next end of file, this returns a string of non-zero size containing the UTF-8 decoding of that data as the first return value. If an end of file has been reached, it returns #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-char input-stream)

This returns two values: a value and another input stream. The input stream returned points exactly past the data read. If a char is available before the next end of file, this returns that char. If an end of file has been reached, the value is #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-string-n input-stream n)

N must be an exact, non-negative integer, specifying the number of chars to be read. This returns two values: a value and another input stream. The input stream returned points exactly past the data read. It tries to read n chars. If n or more chars are available before the next end of file, it returns a string of size n consisting of those chars. If fewer chars are available before the next end of file, it returns a string containing those chars. If end of file has been reached, it returns #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-string-n! input-stream string start count)

Count must be an exact, non-negative integer, specifying the number of chars to be read. This returns two values: a value and another input stream. The input stream returned points exactly past the data read. It tries to read count chars. If count or more chars are available before the next end of file, they are written into string starting at index start, and it returns count as the value. If fewer chars are available before the next end of file, it writes the available chars into string starting at index start, and it returns the number of chars actually read as the value. If end of file has been reached, it returns #f, and the input stream returned points just past the end of file. This procedure will block until either data is available or end of file is reached.

(input-string-all input-stream)

This returns two values: a value and another input stream. If data is available before the next end of file, the value returned is a string contains all text until the next end of file. If no data is available, the value is #f. The input stream returned points just past the end of file. Note that this function may block indefinitely on streams connected to interactive readers, even though data is available.

(input-line input-stream)

This returns two values: a value and another input stream. If data is available before the next newline char, the value is a string that contains all text until the newline char. The input stream returned points just past the newline char.If end of file has been reached, the value is #f.

(stream-eof? input-stream)

This returns #t if the stream has reached end of file, #f otherwise.

(input-stream-position input-stream)

For reader-based input streams, this returns the reader position corresponding to the next octet read from the input stream. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a truncated or closed stream, or to a translated stream.

(input-stream-underliers input-stream)

Input-stream must be an open input stream. This returns two values. If the stream is translated, the first value is the underlying stream, and the second value is the translator procedure. If the stream is based on a reader, this returns the reader as the first value and #f as the second. Moreover, input-stream is put into the truncated state.

Note that, in the case of a translated stream, the returned underlying stream may already point past the data read by operations on input-stream due to buffering.

(input-stream-reader+constructor input-stream)

Input-stream must be an open input stream. This returns two values: a reader and a procedure of one argument. The reader is the underlying reader of the stream at the end of the chain of translations whose head is input-stream. The procedure consumes a reader as its argument and returns a fresh input stream with the same chain of translations as input-stream. This also disconnects the input stream from the reader and puts it into the truncated state; all other input streams based on the input stream at the end of the translation chain (directly or indirectly) are also put into the truncated state.

(close-input-stream input-stream)

This closes the underlying reader if input-stream is still open, and marks the input stream as closed. Applying close-input-stream to a closed stream has no effect. Closing an input stream also closes all input streams that are translations (directly or indirectly) of the input stream at the end of its own translation chain. The return values are unspecified.

(open-file-input-stream filename)

(open-file-input-stream filename file-options)

This opens a reader connected to the file named by filename, passing it file-options if present, and returns an input stream connected to it.

(open-blob-input-stream blob)

This opens a blob reader connected to blob and returns an input stream connected to it.

(open-string-input-stream string)

This opens a blob reader connected to the UTF-8 encoding of string and returns an input stream connected to it.

(open-reader-input-stream reader)

(open-reader-input-stream reader blob)

This returns an input stream connected to the reader reader.

If blob is present, the stream will use it as its initial buffer contents. Subsequently writing to blob directly has unspecified consequences.

(call-with-input-stream input-stream proc)

This calls proc with input-stream as an argument. If proc returns, then the stream is closed automatically and the values returned by proc are returned. If proc does not return, then the stream will not be closed automatically, unless it is possible to prove that the stream will never again be used for a read operation.

(make-translated-input-stream input-stream translate-proc)

(make-translated-input-stream input-stream translate-proc blob)

This returns a translated input stream based on input-stream. Translate-proc must be a procedure that adheres to the following specification:

(translate-proc input-stream wish): Input-stream is the underlying input stream originally passed to make-translated-input-stream. Wish is either #f, #t, or an exact, non-negative integer, giving a hint how much data is requested. #f means a chunk of arbitrary size, suggesting that the user program called input-blob-some, #t means as much as possible, suggesting that the user program called user-input-all, and an integer specifies the requested number of octets. Note that translate-proc can ignore wish. The procedure must return two values, a blob, and another input stream, analogous to the various input-... procedures. #f denotes an end of file. The returned input stream points just past the data returned.

If blob is present, the stream will use it as its initial (translated) buffer contents. Subsequently writing to blob directly has unspecified consequences.

(transcode-input-stream input-stream transcoder)

This creates a transcoded input stream from input-stream, assuming input-stream has the encoding specified by transcoder. It will translate the data from input-stream into UTF-8 with end-of-line encoded by U+000A.

(standard-input-stream)

Return a freshly created stream connected to the standard input reader. Note that a program should not keep the returned stream live, as standard input read in other parts of the program accumulate buffer space.

Output streams

Output streams, like input streams, come in two flavors: either directly based on a writer, or based on another output stream via translation.

Output streams are in one of three states: active, terminated, or closed. When initially created, a stream is active. A program can retrieve the writer underlying an output stream---this automatically incurs disconnecting the stream from the writer, and puts the stream into the terminated state. When explicitly closed, the writer underlying an output stream enters the closed state. The closed state implies the terminated state.

It is an error to perform an output operation on a terminated stream.

(output-stream? obj)

This returns #t if the argument is an output stream, #f otherwise.

(output-blob output-stream blob start count)

(output-blob output-stream blob start)

(output-blob output-stream blob)

Start and count must be non-negative exact integers that default to 0 and (- (blob-length blob) start), respectively. This writes the count octets in blob blob starting at index start to the output stream. It is an error if the blob actually has size less than start + count. The return values are unspecified.

(output-u8 output-stream octet)

This writes the octet octet (which must be an exact integer in the range [0,255]) to the stream. The return values are unspecified.

(output-char output-stream char)

This writes the UTF-8 encoding of the char char to the stream. The return values are unspecified.

(output-string output-stream string start count)

(output-string output-stream string start)

(output-string output-stream string)

Start and count must be non-negative exact integers that default to 0 and (- (string-length blob) start), respectively. This writes the UTF-8 encoding of the substring (substring string (+ start count)) to the stream. The return values are unspecified.

(flush-output-stream output-stream)

This flushes any output from the buffer of output-stream to the underlying writer. It is a no-op if output-stream is terminated. The return values are unspecified.

(output-stream-position output-stream)

For writer-based output streams, this returns the writer position corresponding to the next octet written to the output stream. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a terminated or closed stream, or to a translated stream.

(set-output-stream-position! output-stream pos)

Pos must be a non-negative exact integer. This flushes the output stream and sets the current position of underlying writer to pos. This procedure raises an &i/o-operation-not-available-error condition if the stream does not support the operation. It is an error to apply this procedure to a terminated or closed stream, or to a translated stream. The return values are unspecified.

(output-stream-underliers output-stream)

Output-stream must be an open output stream. First, this flushes output-stream. This returns three values: If output-stream is a translated stream, the first value is the underlying stream, the second value the translation procedure, and the third value the translation state of the stream. If it is directly based on a writer, the first return value is the writer; the second and the third are #f.

(output-stream-writer+constructor output-stream)

Output-stream must be an open output stream. First, this flushes output-stream. This returns two values: a writer and a procedure of one argument. The writer is the underlying writer of the stream at the end of the chain of translations whose head is output-stream. The procedure consumes a writer as its argument and returns a fresh output stream with the same chain of translations as output-stream, where each translation is in the same state as in the chain. This also disconnects the output stream from the writer and puts it into the terminated state; all other output streams based on the output stream at the end of the translation chain (directly or indirectly) are also put into the truncated state.

(close-output-stream output-stream)

This closes the underlying writer if output-stream is still open, and marks the output stream as closed. Applying close-output-stream to a closed stream has no effect. Closing an output stream also closes all output streams that are translations (directly or indirectly) of the output stream at the end of its own translation chain. The return values are unspecified.

(output-stream-buffer-mode output-stream)

This returns the buffer-mode object of output-stream.

(set-output-stream-buffer-mode! output-stream buffer-mode)

If the current buffer mode of output-stream is something other than none and buffer-mode is the none buffer-mode object, this will first flush the output stream. Then, it sets the buffer-mode object associated with output-stream to buffer-mode. The return values are unspecified.

(open-file-output-stream filename)

(open-file-output-stream filename file-options)

This opens a writer connected to the file named by filename via open-file-writer (passing it file-options, which defaults to (file-options)) and returns an output stream with unspecified buffering mode connected to it.

(open-writer-output-stream writer buffer-mode)

This returns an output stream connected to the writer writer with buffering according to buffer-mode.

(call-with-blob-output-stream proc)

Proc is a procedure accepting one argument. This creates an unbuffered output stream connected to a blob writer, and calls proc with that output stream as an argument. The call to call-with-blob-output-stream returns the blob associated with the stream when proc returns.

(call-with-string-output-stream proc)

Proc is a procedure accepting one argument. This creates an unbuffered output stream connected to a blob writer, and calls proc with that output stream as an argument. The call to call-with-string-output-stream returns the UTF-8 decoding of the blob associated with the stream when proc returns.

(call-with-output-stream output-stream proc)

This calls proc with output-stream as an argument. If proc returns, then the stream is closed automatically and the values returned by proc are returned. If proc does not return, then the stream will not be closed automatically, unless it is possible to provide that the stream will never again be used for a write operation.

(make-translated-output-stream output-stream translate-proc state)

This returns a translated output stream based on output-stream. The translation can thread an arbitrary state from one output operation to the next; the initial state is given by state. Translate-proc must be a procedure that adheres to the following specification:

(translate-proc output-stream state data start count)

This is expected to write the output data in data to output-stream, which is the output stream passed into make-translated-output-stream. State is the translation state associated with the output stream. Data is the data to be written: it is either #f, a blob or an octet represented an an exact integer. If data is a blob, start is an exact integer representing the starting index of the data to be written within data. Count is the number of data octets within data to be written. (Otherwise, the values of start and count are unspecified.

If data is #f, this means that the stream is being flushed, and the translation procedure should write out any remaining data encoded in state to the output-stream, and possibly synchronize the protocol.

The procedure must return a new state object, which will be passed to the next call to translate-proc. It is recommended that translate-proc not modify state itself, but rather generate a new state object if necessary. Otherwise, the constructor procedure returned by output-stream-writer+constructor may not operate correctly.

(transcode-output-stream output-stream transcoder)

This creates a transcoded output stream from output-stream, translating the data fed into output-stream into the encoding specified by transcoder, assuming it is encoded as UTF-8 with end-of-line encoded by U+000A.

(standard-output-stream)

(standard-error-stream)

This returns output streams on the standard output writer and standard error writer, respectively.

Opening files for reading and writing

(open-file-input+output-streams filename)
(open-file-input+output-streams filename file-options): This opens a reader and a writer connected to the file named by filename via open-file-reader+writer (passing it file-options, which defaults to (file-options)) and returns an input stream and an output stream with unspecified buffering mode connected to them.

Design rationale

Encoding

Many I/O system implementations allow associating an encoding with a port, allowing the direct use of several different encodings with ports. The problem with this approach is that the encoding/decoding defines a mapping from binary data to text or vice versa. Because of this asymmetry, such mappings do not compose. The result is usually complications and restrictions in the I/O API, such as the inability to mix text or binary data.

This SRFI avoids this problem by specifying that textual I/O always uses UTF-8. This means that, if the target or source of an I/O stream is to use a different encoding, a translated stream needs to be used, for which this SRFI offers the required facilities. This means that text decoders or encoders are expressed as binary-to-binary mappings, and as such compose.

No distinct end of file object

In R5RS, the distinct type of end of file objects is primarily for the benefit of read, where end of file must be denoted by an object that read cannot normally return as a result of parsing the input. However, it does not seem necessary to drag in the complications of this separate object into the other I/O operations, where #f is perfectly adequate to represent end of file.

Reference Implementation

Here is a tarball containing a reference implementation of this SRFI, along with implementations for SRFI 79 (Primitive I/O), SRFI 81 (Port I/O), and SRFI 82 (Stream-Port I/O). It only runs on a version of Scheme 48 that has not been released at the time of writing in this SRFI.

However, its actual dependencies on Scheme 48 idiosyncracies are few. Chief are its use of the module system, which is easily replaced by another, and the implementation of Unicode. To implement primitive readers and writers on files, the code only relies on suitable library procedures to open the files, and read-byte and write-byte procedures to read or write single bytes from a (R5RS) port, as well as a force-output procedure to flush a port.

The reference implementation has not been highly tuned, but I have spent a modest amount of time making the code deal with buffers in an economic buffer. Because of this, the code is more complicated than it needs to be, but hopefully also more usable as a basis for implementing this SRFI in actual Scheme systems.

Examples

Many examples are adapted from The Standard ML Basis Library edited by Emden R. Gansner and John H. Reppy. Cambrige University Press, 2004.

The code makes liberal use of SRFIs SRFI 1 (List Library), SRFI 11 (Syntax for receiving multiple values), SRFI 26 (Notation for Specializing Parameters without Currying).

The tarball with the reference implementation contains these examples along with test cases for them.

For input streams, the successive streams need to be threaded through the program:

(define (input-two-lines s)
  (let*-values (((line-1 s-2) (input-line s))
                ((line-2 _)   (input-line s-2)))
    (values line-1 line-2)))

There may be life after end of file; hence, the following is not guaranteed to return true:

(define (at-end?/broken s)
  (let ((z (stream-eof? s)))
    (let-values (((a s-2) (input-blob-some s)))
      (let ((x (stream-eof? s-2)))
        (equal? z x)))))

... but this is:

(define (at-end? s)
  (let ((z (stream-eof? s)))
    (let-values (((a s-2) (input-blob-some s)))
      (let ((x (stream-eof? s)))
        (equal? z x)))))

Catch an I/O exception:

(define (open-it filename)
  (guard
   (condition
    ((i/o-error? condition)
     (if (message-condition? condition)
         (begin
           (write-string (standard-error-port)
                         (condition-message condition))
           (newline (standard-error-port))))
     #f))
   (open-file-input-stream filename)))

Read a file using Stream I/O:

(define (get-contents/stream filename)
  (call-with-input-stream (open-file-input-stream filename)
    (lambda (stream)
      (let-values (((blob _) (input-blob-all stream)))
        blob))))

Read a file octet by octet:

(define (get-contents/stream-2 filename)
  (call-with-input-stream (open-file-input-stream filename)
    (lambda (stream)
      (let loop ((accum '()) (stream stream))
        (let-values (((octet stream) (input-u8 stream)))
          (if (not octet)
              (list->blob (reverse accum))
              (loop (cons octet accum) stream)))))))

Read a file chunk-by-chunk:

(define (get-contents/stream-3 filename)
  (call-with-input-stream (open-file-input-stream filename)
    (lambda (stream)
      (let loop ((accum '()) (stream stream))
        (let-values (((chunk stream) (input-blob-some stream)))
          (if chunk
              (loop (cons chunk accum) stream)
              (concatenate-blobs (reverse accum))))))))

(define (concatenate-blobs list)
  (let* ((size (fold + 0 (map blob-length list)))
         (result (make-blob size)))
    (let loop ((index 0)
               (blobs list))
      (if (null? blobs)
          result
          (let* ((b (car blobs))
                 (size (blob-length b)))
            (blob-copy! b 0 result index size)
            (loop (+ index size)
                  (cdr blobs)))))))

Drop a word at the beginning of a stream selectively:

(define (eat-thousand stream)
  (let-values (((text new-stream)
                (input-string-n stream (string-length "thousand"))))
    (if (string=? text "thousand")
        new-stream
        stream)))

Skip whitespace at the beginning of a stream:

(define (skip-whitespace stream)
  (let-values (((thing new-stream)
                (input-char stream)))
    (cond
     ((not thing) stream)
     ((char-whitespace? thing)
      (skip-whitespace new-stream))
     (else stream))))

Reading a line could be implemented by scanning forward, then reading a chunk from the original position:

(define (my-input-line stream)
  (let count ((n 0) (g stream))
    (let-values (((thing g*) (input-char g)))
      (cond
       ((not thing)
        (if (zero? n)
            (values #f g*)
            (input-string-n stream n)))
       ((char=? #\newline thing)
        (let*-values (((line _) (input-string-n stream n)))
          (values line g*)))
       (else
        (count (+ 1 n) g*))))))

Write some text to a file:

(define (hello myfile)
  (call-with-output-stream (open-file-output-stream myfile (file-options truncate create))
    (lambda (stream)
      (output-string stream "Hello, ")
      (output-string stream "world!")
      (output-char stream #\newline))))

Extract the reader from a stream, read a octet from it, and then reconstruct a stream from it:

(define (after-first filename)
  (let ((stream (open-file-input-stream filename)))
    (call-with-values
        (lambda () (input-stream-reader+constructor stream))
      (lambda (reader construct)
        (let ((b (make-blob 1)))
          (reader-read! reader b 0 1)
          (call-with-input-stream (construct reader)
            (lambda (stream-2)
              (let-values (((contents _) (input-string-all stream-2)))
                contents))))))))

Extract the reader from a stream, set position, and then reconstruct a stream from it:

(define (after-n stream n)
  (call-with-values
      (lambda () (input-stream-reader+constructor stream))
    (lambda (reader construct)
      (reader-set-position! reader n)
       (call-with-input-stream (construct reader)
         (lambda (stream-2)
           (let-values (((contents _) (input-string-all stream-2)))
             contents))))))

Translate CR/LF to LF on input:

(define (translate-crlf-input original-input-stream wish)

  ;; state automaton

  (define (vanilla input-stream count)
    (call-with-values
        (lambda ()
          (input-u8 input-stream))
      (lambda (octet input-stream)
        (cond
         ((not octet) (finish count))
         ((= 13 octet) (cr input-stream count))
         (else (vanilla input-stream (+ 1 count)))))))
            
  (define (cr input-stream count)
    (call-with-values
        (lambda ()
          (input-u8 input-stream))
      (lambda (octet input-stream)
        (cond
         ((not octet) (finish (+ 1 count)))     ; CR hasn't been counted yet
         ((= 10 octet)
          (call-with-values
              (lambda ()
                (input-blob-n original-input-stream (+ 1 count)))
            (lambda (blob _)
              (blob-u8-set! blob count 10)
              (values blob input-stream))))
         (else (vanilla input-stream (+ count 1)))))))

  (define (finish count)
    (if (zero? count)
        (let-values (((_ past-eof) (input-u8 original-input-stream)))
          (values #f past-eof))
        (call-with-values
            (lambda ()
              (input-blob-n original-input-stream count))
          (lambda (blob input-stream)
            (values blob input-stream)))))
          
  (vanilla original-input-stream 0))

(define (make-crlf-translated-input-stream input-stream)
  (make-translated-input-stream input-stream
                                translate-crlf-input))

Translate LF to CR/LF on output:

(define (translate-crlf-output output-stream state data start count)
  (cond
   ((not data))
   ((blob? data)
    (let ((end (+ start count)))
      (let loop ((index start))
        (cond
         ((blob-index data 10 index end)
          => (lambda (lf-index)
               (output-blob output-stream data index (- lf-index index))
               (output-u8 output-stream 13)
               (output-u8 output-stream 10)
               (loop (+ 1 lf-index))))
         (else
          (output-blob output-stream data index (- end index)))))))
   ((= data 10)
    (output-u8 output-stream 13)
    (output-u8 output-stream 10))
   (else
    (output-u8 output-u8 data)))
  (unspecific))

(define (blob-index blob octet start end)
  (let loop ((index start))
    (cond
     ((>= index end)
      #f)
     ((= octet (blob-u8-ref blob index))
      index)
     (else
      (loop (+ 1 index))))))

Transcoder round trip:

(define (transcoder-round-trip transcoder text)
  (let* ((coded
          (call-with-blob-output-stream
           (lambda (output-stream)
             (let ((output-stream
                    (transcode-output-stream output-stream transcoder)))
               (output-string output-stream text)))))

         (input-stream (open-blob-input-stream coded))
         (input-stream (transcode-input-stream input-stream transcoder)))
    (let-values (((text _) (input-string-all input-stream)))
      text)))

Decoding UTF-32LE via transcoders:

(define (decode-utf-32le blob)
  (let* ((input-stream (open-blob-input-stream blob))
         (input-stream (transcode-input-stream input-stream
                                               (transcoder (codec utf-32le-codec)))))
    (let-values (((text _) (input-string-all input-stream)))
      text)))

Acknowledgements

Sebastian Egner provided valuable comments on a draft of this SRFI. The posters to the SRFI 68 (Comprehensive I/O) provided many very valuable comments. Donovan Kolbly did thorough pre-draft editing. Any remaining mistakes are mine.

References

The Standard ML Basis Library edited by Emden R. Gansner and John H. Reppy. Cambrige University Press, 2004.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Donovan Kolbly