[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: binary vs non-binary ports

At Sat, 18 Sep 2004 09:31:02 -0700, Per Bothner wrote:
> * Most file formats that mix text and binary i/o do *not* handle
> general strings: often they only support whatever character encoding
> the "creative" engineers are most familiar with.

I think relatively few formats assume a single encoding.  Either they
tend to treat strings agnostically as a sequence of bytes (leaving
encoding interpretation up to the programmer), or they allow a means
to specify the encoding.  Gettext and databases specify the encoding
within the file itself.  HTTP, MIME, and most internet standards also
provide a way to specify the encoding.  MIME allows multi-part
messages which may include files of multiple different encodings
within the same byte stream, and not just character encodings but
compression, encryption and other filters.  HTTP uses a chunked
encoding which requires you to switch back and forth between ASCII (to
read the chunk size) and the chunked data encoding within the same
byte stream, with chunks possibly splitting in the middle of a
character, or in the middle of a state in stateful encodings such as
ISO-2022.  These are common cases of the most commonly used protocols.
Mixing encodings is a fact of life.

Oleg has pointed out that Haskell is also in the process of looking
into binary I/O - the discussion is a good reference and comparison: