[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: binary vs non-binary ports

This page is part of the web mail archives of SRFI 56 from before July 7th, 2015. The new archives for SRFI 56 contain all messages, not just those from before July 7th, 2015.



Alex Shinn wrote:

Ideally, as Bear mentioned earlier, I like to think of the byte-level
operations as the only primitives on top of which character-level
operations are defined, but that is an implementation detail.

Yes, but you don't want to force every Scheme implementor to have
to manage this char<->byte mapping in the Scheme run-time, as opposed
to being able to use existing C/C++/Java APIs which don't work the
way you want them to work.

"Complicated" should not prevent us from adding language features, and
I don't see this as any more complex than having additional primitive
port types.

Byte<->Char conversion is complicated.  Not conceptually, but
there are big tables and and a good chunk of code if you want to
support many languages.  Most operating systems and "core libraries"
these days can do the translation.  You really don't want to implement
this code in your Scheme runtime, but instead you want to build on
existing libraries and APIs.

Existing APIs (Java, C++, C) disinguish byte I/o from chracter I/O,
generally using different types.  They may not support easy on-the-fly
switching between binary mode and character mode.

So the proposed model means Scheme run-times have to open ports in
binary mode and do their own byte<->char conversion.  That is not a
nice to ask of Scheme implementors.

It makes no sense to mix character and binary I/O on the same port.
Anyone who tries it is in a state of sin.

I work very often with binary file formats, including Scheme libraries
for handling ELF, TIFF, and the gettext .mo format among others.
Every one of these mixes binary and character data.

I did not say character data - I said character I/O.  It is perfectly
feasable to read/write character and string data from/to a binary
stream - but then you have to define how they are encoded or do the
mapping before/after you write/read them.  If you're in a Japanese
locale, and write a string to an ELF file, what happens?  What happens
when I call (newline) in a Windows environment - should it write "\n"
or "\r\n"?

> Apparently almost
everyone who has ever designed a binary format is a sinner :)

Most of these formats don't support general characters.  Of course
you can have general characters encoded in a ELF section, but ELF
views that as just binary data.  ELF does know about labels and
section names, but there is no support for multiple encodings or
wide characters.

3) Extract character data in binary ports as binary first then convert
   with utility procedures to character/string.

Yes, conceptually that is what should be going on.  But if you want to
be able to do binary I/O on an arbitary port (that was opened in default
mode) then that constrains the implementation unacceptably.  Existing
code that implements ports may have to be extensively rewritten.
--
	--Per Bothner
per@xxxxxxxxxxx   http://per.bothner.com/