Title

Primitive I/O

Author

Michael Sperber

Status

This SRFI is currently in withdrawn status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-79@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2005-10-08
Draft: 2005-11-24
Withdrawn: 2006-11-16

Abstract

This SRFI defines a simple, primitive I/O subsystem for Scheme that is intended to function as the lowest layer of a more comprehensive suite of I/O layers. It provides unbuffered I/O, and is close to what a typical operating system offers. Thus, its interface is suitable for implementing high-throughput and zero-copy I/O.

The Primitive I/O layer also allows clients to implement custom data sources and sinks via a simple interface.

Moreover, this SRFI defines a condition hierarchy specifying common I/O-related exceptional situations.

The Primitive I/O layer only handles blocking-I/O. Non-blocking and selective I/O is left for another SRFI.

This I/O layer was designed in conjunction with two other layers that can be built on top of it: SRFI 80 (Stream I/O) and SRFI 81 (Port I/O).

Rationale

The I/O subsystem in R5RS makes it difficult for a Scheme system to implement high-performance binary I/O. It is possible, but at the cost of complicating the interface significantly.

Moreover, R5RS provides no functionality for a program to create its own port types to connect to custom data sources and sinks. Many Scheme implementations provide such interfaces, but they are often complicated and tend to be volatile as implementation aspects of the I/O system changes. This is because they build directly on the port model, rather than on a more primitive model such as the one specified here, which makes it comparatively easy to implement custom data sources and sinks.

Specification

Prerequisites

This SRFI is meant for Scheme implementations with support for the following SRFIs:

Filenames

Some of the procedures described here accept a filename filename as an argument. Valid values for such a filename include strings naming a file using the native notation of the operating system the Scheme implementation happens to be running on.

It is expected that a future SRFI will extend this set of values by a more abstract representation: This is necessary, as the most common operating systems do not really use strings for representing filenames, but rather octet or word sequences. Moreover, the string notation is difficult to manipulate and not very portable.

General remarks

For procedures that have no "natural" return value, this SRFI often uses the sentence

The return values are unspecified.

This means that number of return values and the return values are unspecified. However, the number of return values is such that it is accepted by a continuation created by begin. Specifically, on Scheme implementations where continuations created by begin accept an arbitrary number of arguments (this includes most implementations), it is suggested that the procedure return zero return values.

Data extensions

Condition types

The I/O conditition type hierarchy here is similar, but not identical to the one described in SRFI 36 (I/O Conditions).

The following list depicts the I/O condition hierarchy; more detailed explanations of the condition types follow.

&error
- &i/o-error
  - &i/o-operation-error (has an operation field)
    - &i/o-operation-not-available-error
  - &i/o-read-error
  - &i/o-write-error
  - &i/o-closed-error
  - &i/o-invalid-position-error
  - &i/o-filename-error (has a filename field)
    - &i/o-malformed-filename-error
    - &i/o-file-protection-error
      - &i/o-file-is-read-only-error
    - &i/o-file-already-exists-error
    - &i/o-no-such-file-error

In exceptional situations not described as "it is an error", the procedures described in the specification below will raise an &i/o-error condition object. Except where explicitly specified, there is no guarantee that the raised condition object will contain all the information that would be applicable. It is recommended, however, that an implementation of this SRFI provide all information about an exceptional situation in the condition object that is available at the place where it is detected.

(define-condition-type &i/o-error &error
  i/o-error?)

This is a supertype for a set of more specific I/O errors.

(define-condition-type &i/o-operation-error &i/o-error
  i/o-operation-error?
  (operation i/o-error-operation))

This condition type specifies an I/O error that occurred during an specific operation. Condition objects belonging to this type must specify the procedure that was called to perform the operation in the operation field. This must not be a procedure directly called by the user program. However, implementations are encouraged to provide a value that will be helpful in determining where the error occurred.

(define-condition-type &i/o-operation-not-available-error &i/o-operation-error
  i/o-operation-not-available-error?)

This condition type indicates that the program tried to perform an I/O operation that was not available.

(define-condition-type &i/o-read-error &i/o-error
  i/o-read-error?)

This condition type specifies a read error that occurred during an I/O operation.

(define-condition-type &i/o-write-error &i/o-error
  i/o-write-error?)

This condition type specifies a write error that occurred during an I/O operation.

(define-condition-type &i/o-invalid-position-error &i/o-error
  i/o-invalid-position-error?
  (position i/o-error-position))

This condition type specifies that an attempt to set the file position specified an invalid position. The value of the position field is the file position that the program intended to set. This condition describes a range error, but not an argument type error.

(define-condition-type &i/o-closed-error &i/o-error
  i/o-error?)

A condition of this type specifies that an operation tried to operate on a closed I/O object under the assumption that it is open.

(define-condition-type &i/o-filename-error &i/o-error
  i/o-filename-error?
  (filename i/o-error-filename))

This condition type specifies an I/O error that occurred during an operation on a named file. Condition objects belonging to this type must specify a file name in the filename field.

(define-condition-type &i/o-malformed-filename-error &i/o-filename-error
  i/o-malformed-filename-error?)

This condition type indicates that a file name had an invalid format.

(define-condition-type &i/o-file-protection-error &i/o-filename-error
  i/o-file-protection-error?)

A condition of this type specifies that an operation tried to operate on a named file with insufficient access rights.

(define-condition-type &i/o-file-is-read-only-error &i/o-file-protection-error
  i/o-file-is-read-only-error?)

A condition of this type specifies that an operation tried to operate on a named read-only file under the assumption that it is writeable.

(define-condition-type &i/o-file-already-exists-error &i/o-filename-error
  i/o-file-already-exists-error?)

A condition of this type specifies that an operation tried to operate on an existing named file under the assumption that it does not exist.

(define-condition-type &i/o-file-exists-not-error &i/o-filename-error
  i/o-file-exists-not-error?)

A condition of this type specifies that an operation tried to operate on an non-existent named file under the assumption that it exists.

File Options

When opening a file, the various procedures in this SRFI accept a file-options object containing a set of flags that specify how the file is to be opened:

(file-options file-options-name...) (syntax)

The syntax file-options returns a file-options object with the specified options set. The following options (all affecting output only) may be used:

`create`	create file if it does not already exist
`exclusive`	an error will be raised if this option and `create` are both set and the file already exists
`truncate`	file is truncated

Any options not in this list may be platform-specific extensions. The implementation is free to ignore them, but must not signal an error.

(file-options-include? file-options-1 file-options-2)

This returns #t if file-options-1 includes all of the flags listed in file-options-2, #f otherwise.

Examples:

(file-options-include? (file-options create exclusive) (file-options create))
; => #t

(file-options-include? (file-options create) (file-options create exclusive))
; => #f

(file-options? obj)

This returns #t if the argument is a file-options object, #f otherwise.

(file-options-union file-options ...)

This returns a file-options object containing all the flags of its arguments.

Readers and Writers

The objects representing I/O descriptors are called readers for input and writers for output. They are unbuffered and operate purely on binary data.

This layer only specifies a fairly small set of operations --- a subset of the Standard ML Basis PRIM_IO signature. Specifically, all functionality related to non-blocking I/O or polling is missing here. This is intentional, as this functionality should be integrated with the threads system of the underlying implementation, and is thus outside the scope of this SRFI. Instead, it is expected that the set of operations available on primitive I/O readers and writers will be augmented by future specifications, as will be the available constructors for these objects.

The Primitive I/O layer has one condition type specific to readers and writers:

(define-condition-type &i/o-reader/writer-error &i/o-error
  i/o-reader/writer-error?
  (reader/writer i/o-error-reader/writer))

This condition type allows specifying the particular reader or writer with which an I/O error is associated. The reader/writer field has purely informational purpose. Conditions raised by Primitive I/O procedures may include an &i/o-reader/writer-error condition, but are not required to do so.

I/O buffers

(make-i/o-buffer size): This creates a blob of size size with undefined contents. Callers of the Primitive I/O procedures are encouraged to use blobs created by make-i/o-buffer because they might have alignment and placement characteristics that make reader-read! and writer-write! more efficient. (These procedures are still required to work on regular blobs, however.)

Readers

A reader object typically stands for a file or device descriptor, but can also represent the output of some algorithm, such as in the case of string readers. The sequence of octets represented is potentially unbounded, and is punctuated by end of file elements.

(reader? obj)

Returns #t if obj is a reader, otherwise returns #f.

(make-simple-reader id descriptor chunk-size read! available get-position set-position! end-position close)

Returns a reader object. Id is a string naming the reader, provided for informational purposes only. Descriptor is supposed to be the low-level object connected to the reader, such as the OS-level file descriptor or the source string in the case of a string reader.

Chunk-size must be a positive exact integer, and represents a recommended efficient size of the read operations on this reader. This is typically the block size of the buffers of the operating system. As such, it is only a hint for clients of the reader---calls to the read! procedure (see below) may specify a different read count. A value of 1 represents a recommendation to use unbuffered reads.

The remaining arguments are procedures---get-position, set-position!, and end-position may be omitted, in which case the corresponding arguments must be #f.

(read! blob start count)

Start and count must be non-negative exact integers. This reads up to count octets from the reader and writes them into blob, which must be a blob, starting at index start. blob must have at least start + count elements. This procedure returns the number of octets read as an exact integer. It returns 0 if it encounters an end of file, or if count is 0. If count is positive, this procedure blocks until at least one octet has been read or it has encountered end of file.

Blob may or may not be a blob returned by make-i/o-buffer. It is possible that reader-read! operates more efficiently if it is, however.

Count may or may not be the same as the chunk size of the reader. It is possible that reader-read! operates more efficiently if it is, however.

(available)

This returns an estimate of the total number of available octets left in the stream. The return value is either an exact integer, or #f if no such estimate is possible. There is no guarantee that this estimate will have any specific relationship to the true number of available octets.

(get-position)

This procedure, when present, returns the current position in the octet stream as an exact integer counting the number of octets since the beginning of the stream. (EOFs do not count as octets.)

(set-position! pos)

This procedure, when present, moves to position pos (which must be a non-negative exact integer) in the stream.

(end-position)

This procedure, when present, returns the position in the octet stream of the next end of file, without changing the current position.

(close)

This procedure marks the reader as closed, performs any necessary cleanup, and releases the resources associated with the reader. Further operations on the reader may signal an error.

(reader-id reader)

This returns a string naming the reader, provided for informational purposes only. For a file reader returned by open-file-reader or open-file-reader+writer, this will be a string representation of the file name.

For a reader created by make-simple-reader, this returns the value that was supplied as the id argument to make-simple-reader.

(reader-descriptor reader)

This returns a low-level object connected to the reader, such as the OS-level file descriptor or the source string in the case of a string reader.

For a reader created by make-simple-reader, this returns the value that was supplied as the descriptor argument to make-simple-reader.

(reader-chunk-size reader)

This returns a positive exact integer that represents a recommended efficient size of the read operations on this reader. This is typically the block size of the buffers of the operating system. As such, it is only a hint for clients of the reader---calls to the reader-read! procedure (see below) may specify a different read count. A value of 1 represents a recommendation to use unbuffered reads.

For a reader created by make-simple-reader, this returns the value that was supplied as the chunk-size argument to make-simple-reader.

(reader-read! reader blob start count)

Start and count must be non-negative exact integers. This reads up to count octets from the reader and writes them into blob, which must be a blob, starting at index start. blob must have at least start + count elements. This procedure returns the number of octets read as an exact integer. It returns 0 if it encounters an end of file, or if count is 0. This procedure blocks until at least one octet has been read or it has encountered end of file.

Blob may or may not be a blob returned by make-i/o-buffer. It is possible that reader-read! operates more efficiently if it is, however.

Count may or may not be the same as the chunk size of the reader. It is possible that reader-read! operates more efficiently if it is, however.

For a reader created by make-simple-reader, this calls the read! procedure of reader with the remaining arguments.

(reader-available reader)

For a reader created by make-simple-reader, this calls the available procedure of reader.

(reader-has-get-position? reader)

This returns #t if reader supports the reader-get-position procedure, and #f otherwise.

(reader-get-position reader)

When reader-has-get-position returns #t for reader, this returns the current position in the octet stream as an exact integer counting the number of octets since the beginning of the stream.

For a reader created by make-simple-reader, this calls the get-position procedure of reader, if present. It is an error to call this procedure if reader does not have a get-position procedure.

(reader-has-set-position!? reader)

This returns #t if reader supports the reader-set-position! operation, and #f otherwise.

(reader-set-position! reader pos)

When reader-has-set-position!? returns #t for reader, moves to position pos (which must be a non-negative exact integer) in the stream.

For a reader created by make-simple-reader, this calls the set-position! procedure of reader with the pos argument, if present. It is an error to call this procedure if reader does not have a set-position! procedure.

(reader-has-end-position? reader)

This returns #t if reader supports the reader-end-position operation, and #f otherwise.

(reader-end-position reader)

When reader-has-end-position? returns #t for reader, this returns the position in the octet stream of the next end of file, without changing the current position.

For a reader created by make-simple-reader, this calls the end-position procedure of reader, if present. It is an error to call this procedure if reader does not have a end-position procedure.

(reader-close reader)

This marks the reader as closed, performs any necessary cleanup, and releases the resources associated with the reader. Further operations on the reader may signal an error.

For a reader created by make-simple-reader, this calls the close procedure of reader.

(open-blob-reader blob)

This returns a reader that uses a copy of blob, a blob, as its contents. This reader has get-position, set-position!, and end-position operations.

(open-file-reader filename)

(open-file-reader filename file-options)

This returns a reader connected to the file named by filename.The file-options object defaults to (file-options)if not present. It may determine various aspects of the returned reader, see the section on file options. This reader may or may not have get-position, set-position!, and end-position operations.

(standard-input-reader)

This returns a reader connected to the standard input. The meaning of "standard input" is implementation-dependent.

Writers

A writer object typically stands for a file or device descriptor, but generally represents a sink that will process the data written to it in some arbitrary way.

(writer? obj)

Returns #t if obj is a writer, otherwise returns #f.

(make-simple-writer id descriptor chunk-size write! get-position set-position! end-position close)

Returns a writer object. Id is a string naming the writer, provided for informational purposes only. For a file, this will be a string representation of the file name. Descriptor is supposed to be the low-level object connected to the writer, such as the OS-level file descriptor.

Chunk-size must be a positive exact integer, and is the recommended efficient size of the write operations on this writer. As such, it is only a hint for clients of the reader---calls to the write! procedure (see below) may specify a different write count. A value of 1 represents a recommendation to use unbuffered writes.

The remaining arguments are procedures --- get-position, set-position!, and end-position may be omitted, in which case the corresponding arguments must be #f.

(write! blob start count)

Start and count must be non-negative exact integers. This writes up to count octets in blob blob starting at index start. Before it does this, it will block until it can write at least one octet. It returns the number of octets actually written as a positive exact integer.

Blob may or may not be a blob returned by make-i/o-buffer. It is possible that writer-write! operates more efficiently if it is, however.

Count may or may not be the same as the chunk size of the reader. It is possible that writer-write! operates more efficiently if it is, however.

(get-position)

This procedure, when present, returns the current position in the octet stream as an exact integer counting the number of octets since the beginning of the stream.

(set-position! pos)

This procedure, when present, moves to position pos (which must be a non-negative exact integer) in the stream.

(end-position)

This procedure, when present, returns the octet position of the next end of file, without changing the current position.

(close)

This procedure marks the writer as closed, performs any necessary cleanup, and releases the resources associated with the writer. Further operations on the writer may signal an error.

(writer-id writer)

This returns string naming the writer, provided for informational purposes only. For a file writer returned by open-file-writer or open-file-reader+writer, this will be a string representation of the file name.

For a writer created by make-simple-writer, this returns the value of the id field of the argument writer.

(writer-descriptor writer)

This returns a low-level object connected to the writer, such as the OS-level file descriptor.

For a writer created by make-simple-writer, this returns the value of the descriptor field of the argument writer.

(writer-chunk-size writer)

This returns a positive exact integer, and is the recommended efficient size of the write operations on this writer. As such, it is only a hint for clients of the reader---calls to writer-write! (see below) may specify a different write count. A value of 1 represents a recommendation to use unbuffered writes.

For a writer created by make-simple-writer, this returns the value of the chunk-size field of the argument writer.

(writer-write! writer blob start count)

Blob may or may not be a blob returned by make-i/o-buffer. It is possible that writer-write! operates more efficiently if it is, however.

Count may or may not be the same as the chunk size of the reader. It is possible that writer-write! operates more efficiently if it is, however.

For a writer created by make-simple-writer, this calls the write! procedure of writer with the remaining arguments.

(writer-has-get-position? writer)

This returns #t if writer supports the writer-get-position operation, and #f otherwise.

(writer-get-position writer)

When writer-has-get-position? returns #t for writer, this returns the current position in the octet stream as an exact integer counting the number of octets since the beginning of the stream.

For a writer created by make-simple-writer, this calls the get-position procedure of writer, if present. It is an error to call this procedure if writer does not have a get-position procedure.

(writer-has-set-position!? writer)

This returns #t if writer supports the writer-set-position! operation, and #f otherwise.

(writer-set-position! writer pos)

When writer-has-set-position!? returns #t for writer, this moves to position pos (which must be a non-negative exact integer) in the stream.

For a writer created by make-simple-writer, this calls the set-position! procedure of writer with the pos argument, if present. It is an error to call this procedure if writer does not have a set-position! procedure.

(writer-has-end-position? writer)

This returns #t if writer supports the writer-end-position operation, and #f otherwise.

(writer-end-position writer)

When writer-has-end-position? returns #t for writer, this returns the octet position of the next end of file, without changing the current position.

For a writer created by make-simple-writer, this calls the end-position procedure of writer, if present. It is an error to call this procedure if writer does not have a end-position procedure.

(writer-close writer)

This marks the writer as closed, performs any necessary cleanup, and releases the resources associated with the writer. Further operations on the writer may signal an error.

For a writer created by make-simple-writer, this calls the close procedure of writer.

(open-blob-writer)

This returns a writer that can yield everything written to it as a blob. This writer has get-position, set-position!, and end-position operations.

(writer-blob writer)

The writer argument must be a blob writer returned by open-blob-writer. This procedure returns a newly allocated blob containing the data written to writer in sequence. Doing this in no way invalidates the writer or change its store.

(open-file-writer filename)

(open-file-writer filename file-options)

This returns a writer connected to the file named by filename. The file-options object defaults to (file-options)if not present. It determines various aspects of the returned writer, see the section on file options. This writer may or may not have get-position, set-position!, and end-position operations.

(standard-output-writer)

This returns a writer connected to the standard output. The meaning of "standard output" is implementation-dependent.

(standard-error-writer)

This returns a writer connected to the standard error. The meaning of "standard error" is implementation-dependent.

Opening files for reading and writing

(open-file-reader+writer filename)
(open-file-reader+writer filename file-options): This returns two values, a reader and a writer connected to the file named by filename. The file-options object defaults to (file-options)if not present. It determines various aspects of the returned writer and possibly the reader, see the section on file options. This writer may or may not have get-position, set-position!, and end-position operations.
Note: This procedure enables opening a file for simultaneous input and output in environments where it is not possible to call open-file-reader and open-file-writer on the same file.

Reference Implementation

Here is a tarball containing a reference implementation of this SRFI, along with implementations for SRFI 80 (Stream I/O), SRFI 81 (Port I/O), and SRFI 82: (Stream-Port I/O). It only runs on a version of Scheme 48 that has not been released at the time of writing in this SRFI.

However, its actual dependencies on Scheme 48 idiosyncracies are few. Chief are its use of the module system, which is easily replaced by another, and the implementation of Unicode. To implement primitive readers and writers on files, the code only relies on suitable library procedures to open the files, and read-byte and write-byte procedures to read or write single bytes from a (R5RS) port, as well as a force-output procedure to flush a port.

Examples

The tarball with the reference implementation contains these examples along with test cases for them.

This customized reader reads from a list of blobs. A null blob yields EOF. Procedures for defining streams based on such readers follow.

(define (open-blobs-reader bs)
  (let* ((pos 0))
                   
    (make-simple-reader
     "<octet vectors>"
     bs
     5                                  ; for debugging
     (lambda (blob start count)
       (cond
        ((null? bs)
         0)
        (else
         (let* ((b (car bs))
                (size (blob-length b))
                (real-count (min count (- size pos))))
           (blob-copy! b pos
                       blob start
                       real-count)
           (set! pos (+ pos real-count))
           (if (= pos size)
               (begin
                 (set! bs (cdr bs))
                 (set! pos 0)))
           real-count))))
     ;; pretty rough ...
     (lambda ()
       (if (null? bs)
           0
           (- (blob-length (car bs)) pos)))
     #f #f #f                           ; semantics would be unclear
     (lambda ()
       (set! bs #f)))))                 ; for GC

(define (open-strings-reader strings)
  (open-blobs-reader (map string->utf-8 strings)))

(define (open-blobs-input-stream blobs)
  (open-reader-input-stream (open-blobs-reader blobs)))

(define (open-strings-input-stream strings)
  (open-reader-input-stream (open-blobs-reader (map string->utf-8 strings))))

Algorithmic reader producing an infinite stream of blanks:

(define (make-infinite-blanks-reader)
  (make-simple-reader "<blanks, blanks, and more blanks>"
                      #f
                      4096
                      (lambda (blob start count)
                        (let loop ((index 0))
                          (if (>= index count)
                              index
                              (begin
                                (blob-u8-set! blob (+ start index) 32)
                                (loop (+ 1 index))))))
                      (lambda ()
                        1000) ; some number
                      #f #f #f
                      unspecific))

Acknowledgements

Sebastian Egner provided valuable comments on a draft of this SRFI. The posters to the SRFI 68 (Comprehensive I/O) provided many very valuable comments. Donovan Kolbly did thorough pre-draft editing. Any remaining mistakes are mine.

References

The Standard ML Basis Library edited by Emden R. Gansner and John H. Reppy. Cambrige University Press, 2004.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Donovan Kolbly