[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: binary vs non-binary ports

This page is part of the web mail archives of SRFI 56 from before July 7th, 2015. The new archives for SRFI 56 contain all messages, not just those from before July 7th, 2015.



At Wed, 15 Sep 2004 21:51:44 -0700, Per Bothner wrote:
> 
>  From the draft:
>  > Some Schemes may wish to distinguish between binary and non-binary
>  > ports as in Common-Lisp. As these can be layered on top of the
>  > current ports this may better be relegated to a separate SRFI.
> 
> Huh?  This is backwards.  The current ports are character ports.
> As such they are layered on top of byte ports.  I.e. non-binary
> ports are layered on top of binary ports.

Hello,

I had in fact expected more opposition to this earlier and was
wondering when it would turn up :)

> You have to specify when the port is *opened* whether it is a
> binary or character.

That is one philosophy.  Another is that given ports with which you
can perform both byte and character operations, you can implement such
B&D-style ports on top of them.

> The alternative is for read-byte/write-byte to peek into the
> implementation of a character port, and operate on the underlying
> byte port.

Ideally, as Bear mentioned earlier, I like to think of the byte-level
operations as the only primitives on top of which character-level
operations are defined, but that is an implementation detail.

> This is losing:
> (a) It complicates synchronizing (buffering) between the character
> stream and the byte stream.

"Complicated" should not prevent us from adding language features, and
I don't see this as any more complex than having additional primitive
port types.

> (b) Some implementations of character streams may buffer a chuck
> of bytes.  If some bytes in the file cannot be mapped to characters
> in the current character encoding, an exception may be signalled.

Yes, and the SRFI goes so far as to provide a SRFI-36 condition for
such a case.

> (c) In some environments you cannot get at the underlying byte
> stream from a character stream.  This includes Java.  A Scheme
> implementation could do its own implementation of character streams
> such that you could get at the underlying byte stream, but then
> the read functions would only work on character streams created
> using Scheme run-time routines, which complicates both implementation
> and interoperability.

This is again the complexity argument.  The above strategy is also a
backwards approach, as you said earlier, and could be made simpler and
more efficient by making the byte-level operations the only
primitives.

> It makes no sense to mix character and binary I/O on the same port.
> Anyone who tries it is in a state of sin.

I work very often with binary file formats, including Scheme libraries
for handling ELF, TIFF, and the gettext .mo format among others.
Every one of these mixes binary and character data.  Apparently almost
everyone who has ever designed a binary format is a sinner :)

> Kawa does treat binary ports as character ports with a special
> character encoding of "binary".

This is a feature others are likely to want, and you will undoubtedly
find support if you write a SRFI for it.  It can be implemented in
portable Scheme on top of SRFI-56 by redefining the current port
primitives.

Given disjoint ports I can think of 3 options for working with binary
formats, almost all of which include character data:

1) Toggle the port between binary mode and character mode.  Clumsy and
   error-prone.  Does not solve the problem that the character port
   will at times still be pointing to invalid characters.

2) Open two ports to the same source, one in character mode and one in
   binary mode, and read from them separately.  Same problems as above
   with the added difficulty of keeping the two in the same position
   when the binary data is closely interleaved with character data.

3) Extract character data in binary ports as binary first then convert
   with utility procedures to character/string.  In this case I would
   simply define as convenience forms read-char, read-line, etc. in
   terms of these utilities and the resulting API is indistinguishable
   from the current SRFI-56.

Potential encoding errors are inherent in all ports and have to be
dealt with whether or not you have disjoint port types.  Any safety
measures can and will be circumvented, becoming merely an
inconvenience while providing a false sense of security.

-- 
Alex