[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

finalize or withdraw?

This page is part of the web mail archives of SRFI 56 from before July 7th, 2015. The new archives for SRFI 56 contain all messages, not just those from before July 7th, 2015.

Discussion has settled down and there haven't been many changes
recently, so it's getting time to give SRFI-56 the thumbs up or thumbs
down, but I'm still unhappy with some things.

The library procedures provide a decent set of primitives for single
numeric values.  Other common binary I/O idioms are records and tables
of numeric values, and although they would be more efficient with their
own primitives, can at least be easily implemented in SRFI-56.

The library style of reading and writing single different sized numbers
at a time is especially well suited to such applications as writing a
portable assembler.  However, an assembler invariably needs to read and
write strings within the binary data, and SRFI-56 provides no portable
means to do this.  There are several ways to handle this inevitable
mixing of binary data and text:

  * separate procedures to write text to binary ports
    - something like read/write-utf8-string
    - Java takes this approach letting you read and write UTF-8
      characters to binary ports.
    - a complete API would need to handle all encodings.

  * procedures to convert strings to and from srfi-4 uvectors or blobs

  * separate mixed binary+text ports
    - open-binary-and-character-input-file
    - wouldn't have guarantees about buffering efficiency

  * layered ports
    - open a character port on top of a binary port, read/write a value,
      close the character port then continue using the binary port
    - flexible but more than we need for simple assembler-type cases

Without something like this it is simply impossible to write a portable
assembler.  In fact, the vast majority of binary formats include some
form of text.  If finalized as is, I'm hard pressed to think what
SRFI-56 would make portably possible that wasn't before.

I have three options at this point.  Finalize SRFI-56, and hope for
future extensions to sort out the mixed I/O mess.  In the meantime,
implementations which support mixed I/O natively would probably just do
so, leaving Java and C wchar based implementations in the cold.

Alternately I could add another quick patch.  The simplest solution is
to provide char->ucs and ucs->char procedures to convert between
characters and their Unicode code-point values.  This would make code
that needed ASCII text in binary files trivial to support, and full
Unicode could be supported by implementing your own
read/write-utf8-string or read-utf8-into-u8vector! or similar.  The
truly ambitious could implement their own encoding routines on top of
this, so at this point we do have full portability.  Convenience
encoding routines could be added in separate SRFIs.

Or, finally, I could simply withdraw the SRFI and considering writing up
a new one.  Character I/O can't properly be handled without addressing
encodings, and the specification of what an encoding is
(string, symbol or object?) and what forms and names it takes is
probably sufficient for its own SRFI.  Another missing crucial feature
for even minimal binary I/O is random access, without which one can't
even implement a (meaningful) b-tree.  At this point there would be too
many changes to band-aid onto the already long overdraft SRFI-56.

Minimality was a major goal, but has SRFI-56 fallen short of usefulness?
I've been using a SRFI-56-like library for many, many projects, but on
checking, every single one of them freely mixes binary and character I/O.