by the R6RS editors; John Cowan (shepherd)
This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-181@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
This SRFI is derived from parts of library section 8.2.4, library section 8.2.7, library section 8.2.10, and library section 8.2.13 of the R6RS. These sections are themselves based on parts of SRFI 79, SRFI 80 and SRFI 81. These procedures provide a hook into the Scheme port system from below, allowing the creation of custom ports that behave as much as possible like the standard file, string, and bytevector ports, but that call a procedure to produce data to input ports or to consume data from output ports. Procedures for creating ports that transcode between bytes and characters are an important special case and are also documented in this SRFI.
When reading from (or writing to) files, devices, pipes, sockets, or other sources (or sinks) of data, it's often useful or necessary to perform one or more transformations on the data.
All of these can be done at a level above Scheme ports, but with some
loss in convenience. In particular, the high-level Scheme I/O procedures
like read
, write
, and display
only accept port arguments. By making it possible to create custom ports
that accept a low-level read (write) operation, perform a transformation,
and pass it on to some other port, convenience is served. It is also
straightforward to chain custom ports together
in order to create transformation pipelines.
Examples of such transformations are:
A very important case of transformation is character encoding and decoding. It's sometimes useful or necessary to handle textual data that are encoded differently from the system default encoding. This SRFI provides comprehensive facilities for handling many different encodings by creating a custom textual port on top of a binary port.
Note: The effect of char-ready?
and u8-ready?
on custom ports is unspecified.
The types of the arguments to the procedures of this section of this SRFI are as follows:
The id is an arbitrary object. It is meant to be used to help the implementation distinguish ports. This SRFI does not provide any mechanism to retrieve the id from a port.
The read! and write! arguments are procedures that behave as specified below.
The get-position and set-position!
arguments may be procedures or #f
, in which case calls to
SRFI 191 port-position
and set-port-position!
respectively on the procedures are errors.
The close argument may be a procedure
that takes no arguments
and performs any required actions when the port is closed,
or #f
, in which case no action is taken.
The optional flush argument may be a procedure
that takes no arguments
and performs any necessary actions when the port is flushed,
or #f
or omitted, in which case no action is taken.
(make-custom-binary-input-port
id read!
get-position set-position! close)Returns a newly created binary input port whose byte source is an arbitrary algorithm represented by the read! procedure.
(read! bytevector start count)
It is an error if the following conditions on the arguments are not met: start is a non-negative exact integer, count is a positive exact integer, and bytevector is a bytevector whose length is at least start + count.
The read! procedure obtains up to count bytes from the byte source, and writes those bytes into bytevector starting at index start. The read! procedure returns an exact integer. This integer represents the number of bytes that it has read. To indicate an end of file, the read! procedure writes no bytes to its bytevector and returns 0.
(get-position)
The get-position procedure returns some implementation-dependent object representing as much of the state of the port at its current position as is necessary to save and restore that position. This value may be useful only as the pos argument to set-position!, if the latter is even supported on the port (see below).
However, if the object is an exact integer, then it is the position measured in bytes, which coincides with the position that the next call of read! procedure reads from, and can be used to compute a new position some specified number of bytes away.
(set-position! pos)
It is implementation-defined what happens if pos is neither the return value of a call to port-position on port, in which case the position is reset to that point, nor an exact integer, in which case the port position is set to the specified number of bytes from the beginning of the port data. If this is not sufficient information to specify the port state, or the specified position is uninterpretable by the port, an error is signaled.
(make-custom-textual-input-port
id read!
get-position set-position! close)Returns a newly created textual input port whose character source is an arbitrary algorithm represented by the read! procedure.
(read! string-or-char-vector start count)
It is an error if the following conditions on the arguments are not met: start is a non-negative exact integer, count is a positive exact integer, and string-or-char-vector is either a string or a vector containing characters, at the implementations's option. In either case its length is at least start + count.
The read! procedure obtains up to count characters from the character source, and writes those characters into string-or-char-vector starting at index start. It must be prepared to receive either a string or a vector of characters. The read! procedure returns an exact integer representing the number of characters that it has written. To indicate an end of file, the read! procedure writes no bytes to the buffer and returns 0.
(get-position)
The get-position procedure returns an implementation-defined object representing the current position of the input port, which coincides with the position the next call of read! procedure reads from.
(set-position! pos)
It is an error if pos does not belong to the correct implementation-defined type. The set-position! procedure sets the position of the input port to pos.
(make-custom-binary-output-port
id
write! get-position set-position! close [flush])Returns a newly created binary output port whose byte sink is an arbitrary algorithm represented by the write! procedure.
(write! bytevector start count)
It is an error if the following conditions on the arguments are not met: start and count are non-negative exact integers, and bytevector is a bytevector whose length is at least start + count.
The write! procedure writes up to count bytes from bytevector starting at index start to the byte sink. The write! procedure returns the number of bytes that it wrote, as an exact integer.
(get-position)
The get-position procedure returns some implementation-dependent object representing as much of the state of the port at its current position as is necessary to save and restore that position. This value may be useful only as the pos argument to set-position!, if the latter is even supported on the port (see below).
However, if the object is an exact integer, then it is the position measured in bytes, which coincides with the position that the next call of write! procedure writes to, and can be used to compute a new position some specified number of bytes away.
(set-position! pos)
It is implementation-defined what happens if pos is neither the return value of a call to port-position on port, in which case the position is reset to that point, nor an exact integer, in which case the port position is set to the specified number of bytes from the beginning of the port data. If this is not sufficient information to specify the port state, or the specified position is uninterpretable by the port, an error is signaled.
(make-custom-textual-output-port
id
write! get-position set-position! close [flush])Returns a newly created textual output port whose character sink is an arbitrary algorithm represented by the write! procedure.
(write! string-or-char-vector start count)
It is an error if the following conditions on the arguments are not met: start and count are non-negative exact integers, and string-or-char-vector is either a string or a vector containing characters, at the implementations's option. In either case its length is at least start + count.
The write! procedure writes up to count characters from string-or-char-vector starting at index start to the character sink. In any case, the write! procedure returns the number of characters that it wrote, as an exact integer.
(get-position)
The get-position procedure returns an implementation-defined object representing the current position of the input port, which coincides with the position the next call of write! writes to.
(set-position! pos)
Pos is a non-negative exact integer. The set-position! procedure sets the position of the input port to pos.
(make-custom-binary-input/output-port
id read! write!
get-position set-position! close [flush])
Returns a newly created binary port that is both an input and an
output port. Its byte source and sink are arbitrary algorithms
represented by the read! and write! procedures.
Each of the arguments behaves as specified in the description of
make-custom-binary-input
(for read!) or
make-custom-binary-output-port
(for the other arguments).
Note: R6RS provides custom textual input/output ports (i.e. textual ports that support both input and output), but they are difficult to implement and there are no clear use cases for them, so they have been removed from this SRFI.
(make-file-error
obj ...)
Returns an object which satisfies the R7RS-small predicate
file-error?
.
The use of the objs is implementation-defined.
Custom ports may raise the result of this procedure
from their open procedures.
In order to create a port that transcodes between characters and bytes, it is necessary to have a transcoder available. The following sections explain how to create and use transcoders.
A transcoder is an immutable Scheme object that combines a codec, an end-of-line style, and an error-handling mode (see the following sections for details). Each transcoder represents some specific bidirectional (but not necessarily lossless), possibly stateful translation between byte sequences and the Scheme-level characters and strings allowed by the implementation. Every transcoder can decode bytes as characters and encode characters as bytes.
(make-transcoder codec eol-style handling-mode)
Returns a transcoder with the behavior specified by its arguments.
(native-transcoder)
Returns an implementation-dependent transcoder that represents a possibly locale-dependent “native” transcoding. This should be equivalent to the transcoder employed by Scheme operations that open textual ports.
Returns a new textual port with the specified transcoder from binary-port. The new textual port's externally visible state is largely the same as that of binary-port. If binary-port is an input port, the new textual port will be an input port and will decode the bytes of binary-port. If binary-port is an output port, the new textual port will be an output port and will write encoded characters to binary-port.
It is an error to call this procedure on binary-port after it has been read from or written to. It is also an error to read or write on binary-port after calling this procedure.
(bytevector->string bytevector transcoder)
Returns the string that results from decoding the bytevector according to the input direction of the transcoder.
(string->bytevector string transcoder)
Returns the bytevector that results from encoding the string according to the output direction of the transcoder.
Several different character encoding schemes exist that describe standard ways to encode characters and strings as byte sequences and to decode those sequences. Within this document, a codec is an immutable Scheme object that represents a specific encoding scheme. A codec has one or more names, represented as strings, and whatever other properties it requires in order to implement specific rules for encoding and decoding.
(make-codec string)
Returns a codec representing the character encoding scheme one of whose names matches the string string case-insensitively.
Some character names, encodings and corresponding algorithms can be found at the WHATWG encoding specification, and implementations should recognize and support all of these that are feasible given space constraints. There are a total of 39 encodings, which have between them 218 standard names. Note that the "replacement" codec signals an error whenever it is used. Additional encodings listed at the IANA page on character sets are not recommended.
If make-codec
is called
on a string that the implementation does not support,
an error satisfying unknown-encoding-error?
is signaled.
(latin-1-codec)
(utf-8-codec)
(utf-16-codec)
These are predefined codecs for the ISO 8859-1, UTF-8,
and UTF-16 encoding schemes.
When decoding, the implementation must respect any BOM present, but
the implementation may assume either endianness if no BOM is present.
When encoding, whether a BOM is output and what endianness is used
are implementation-dependent.
A call to any of these procedures returns a value that is equal in the
sense of eqv?
to the result of any other call to the same
procedure.
(unknown-encoding-error? obj)
Returns #t
if obj is a condition object
raised by make-codec
or one of a set of implementation-defined
objects.
(unknown-encoding-error-name unknown-encoding-obj)
Extracts the name of the unknown encoding from unknown-encoding-obj and returns it as a string. It is an error to mutate this string.
An end-of-line style is a symbol that
describes how a textual port transcodes representations of
line endings.
In order to conform to this SRFI, implementations must
support at least three kinds of line endings:
a #\newline
character,
a #\return
character, and
a #\return
followed by a
,
which is known as a CRLF sequence.
Note that these match the line endings recognized by the R7RS
#\newline
read-line
procedure even when invoked on a non-transcoding port.
Implementations may support other line endings as well.
The end-of-line style symbol none
means that no line ending conversion is performed in either direction.
On an input port, any other symbol will convert
any line ending into a #\newline
character.
On an output port, the symbol crlf
causes
any line ending to be output as a CRLF sequence,
whereas the symbol lf
causes
any line ending to be output
as a #\newline
character,
All other characters remain unchanged.
Implementations may support additional symbols.
(native-eol-style)
Returns the default end-of-line style of the underlying platform, typically
lf
on Unix and crlf
on Windows.
An error-handling mode is a symbol that specifies the behavior of textual I/O operations in the presence of encoding or decoding errors.
If a textual input operation encounters an invalid or incomplete
character encoding,
then if the error-handling mode is replace
, the erroneous bytes
are treated as the character #\xFFFD;
, or if that character is
not representable by the implementation or is not permitted in
strings, then by the character #\?
(question mark).
But if the error-handling mode is raise
,
an error satisfying i/o-decoding-error?
is signaled,
an appropriate number of bytes are ignored, and decoding
continues with the following bytes.
If a textual output operation encounters a character it cannot encode,
and if the error-handling mode is replace
, a
replacement character is encoded instead, and encoding
continues with the next character.
The replacement character is #\xFFFD
for transcoders
that can encode this character, but is #\?
(question mark)
for transcoders that cannot.
But if the error-handling mode is raise
, an
an error satisfying i/o-encoding-error?
is raised,
and encoding continues with the next character.
Implementations may support additional symbols.
(i/o-decoding-error? obj)
Returns #t
if obj is
an exception raised when one of the operations for
textual input from a port encounters a sequence of bytes that cannot
be decoded into a character or string by the port's transcoder.
When such an exception is raised, the port's position is past the invalid encoding.
(i/o-encoding-error? obj)
Returns #t
if obj is
an exception raised when one of the operations for
textual output to a port encounters a character that cannot be
encoded into bytes by the port's transcoder.
(i/o-encoding-error-char i/o-encoding-condition)
Returns the character that could not be encoded when the condition i/o-encoding-condition was signaled.
Every conforming R6RS implementation, including at least Chez, IronScheme,
Larceny, Racket, and Vicare, already provides these procedures
in the (rnrs io ports)
library, with the exceptions
of make-file-error
, which will already exist though
not necessarily be exposed,
and of make-codec
,
unknown-encoding-error?
, and
unknown-encoding-error-name
.
Therefore, no implementation is provided here, especially since
a portable implementation is not possible.
However, strictly conforming R6RS implementations will not accept the flush argument, though a wrapper to accept and ignore it would be trivial. Furthermore, the read! and write! procedures will never be passed a vector of characters, but always a string.
This SRFI can be implemented on top of the Chicken
procedures make-input-port
and make-output-port
in the (chicken ports)
library.
Chicken makes
makes no provisions for getting and setting positions on either its
built-in ports or custom ones. It also does not
distinguish between textual and binary ports (as permitted by R7RS), and its strings
can store binary data; indeed, interpretation as characters
is up to a higher-level library such as the utf8
egg.
Shiro Kawai has provided a sample implementation that illustrates both transcoded ports and SRFI 192. It includes a positionable vector-backed custom port library to illustrate the use case of custom ports. The sample implementation, including the examples, can be found in the Git repository for SRFI 192 and in this .tgz archive.
The following considerations must be applied when peek-char
and peek-u8
are used on custom ports. Much of the
following is derived from Shiro Kawai's README file in the SRFI 192
implementation:
If the source of characters or bytes (collectively known as
elements) underlying a custom input port is natively
positionable (either because the source is itself a positionable port or because
it is a random-access data structure like a list, vector, or string),
then the custom port can support the SRFI 192 and R6RS procedures
port-position
and set-port-position!
.
This is true even if reading an element from the custom port involves
reading a variable number of elements from the source.
If on the other hand the source is not seekable,
the get-position procedure can simply return
the number of elements read from it so far,
whereas the set-position! argument should be
set to #f
, in which case
set-port-position!
will be disabled.
However, the Scheme implementation's port system must
cache the position of a custom port before peeking it, because
it may not be possible for the port to rewind its position.
The peeked element is also cached, so that on the next
read it can be returned.
But if port-position
is called before the
peeked character is read, the port must return its
cached position rather than calling the get-position
procedure.
Much of the content of this SRFI is drawn from R6RS, which does not have a copyright notice. It does, however, contain the following copyright license:
We intend this report to belong to the entire Scheme community, and so we grant permission to copy it in whole or in part without fee.
For the remaining content, the standard SRFI license applies:
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:This permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.