[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

binary vs non-binary ports

This page is part of the web mail archives of SRFI 56 from before July 7th, 2015. The new archives for SRFI 56 contain all messages, not just those from before July 7th, 2015.

From the draft:
> Some Schemes may wish to distinguish between binary and non-binary
> ports as in Common-Lisp. As these can be layered on top of the
> current ports this may better be relegated to a separate SRFI.

Huh?  This is backwards.  The current ports are character ports.
As such they are layered on top of byte ports.  I.e. non-binary
ports are layered on top of binary ports.

Remember that one character may be many bytes.

You have to specify when the port is *opened* whether it is a
binary or character.  The alternative is for read-byte/write-byte
to peek into the implementation of a character port, and operate
on the underlying byte port.  This is losing:
(a) It complicates synchronizing (buffering) between the character
stream and the byte stream.
(b) Some implementations of character streams may buffer a chuck
of bytes.  If some bytes in the file cannot be mapped to characters
in the current character encoding, an exception may be signalled.
(c) In some environments you cannot get at the underlying byte
stream from a character stream.  This includes Java.  A Scheme
implementation could do its own implementation of character streams
such that you could get at the underlying byte stream, but then
the read functions would only work on character streams created
using Scheme run-time routines, which complicates both implementation
and interoperability.

It makes no sense to mix character and binary I/O on the same port.
Anyone who tries it is in a state of sin.

Kawa does treat binary ports as character ports with a special
character encoding of "binary".  Character ports may of course
have different encodings (UTF-8, Latin-1, JIS, ...) which defines
how bytes are mapped to characters and vice versa.  This encoding
must be specified when the port is opened.  The "binary" encoding
is basically char->integer and integer->char.  I.e.
reading a byte I returns (integer->char I).  The advantage
of this trick is that a lot of existing Scheme code that assumes
that characters are the same as bytes can continue to work.
That essentially forces the character encoding to Latin-1 with
Unix line terminators.

The alternative is to prehibit the existing character port functions
(read, display, read-char, etc) on binary streams.  That is probably
cleaner and safer.
	--Per Bothner
per@xxxxxxxxxxx   http://per.bothner.com/