[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Names and primitives in SRFI 56

This page is part of the web mail archives of SRFI 56 from before July 7th, 2015. The new archives for SRFI 56 contain all messages, not just those from before July 7th, 2015.




On Sat, 18 Sep 2004, Hans Oesterholt-Dijkema wrote:
> Alex Shinn wrote:
>> Apart from further conflicting with possible binary/character port
>> distinctions,

>Hmm. I'm not sure I agree on that. Binary I/O simply means there's
>no interpretation given to the I/O; As I see it, the primitives
>to write and read provide the interpretation (see also my earlier
>e-mail about doors and what goes through them).

The thing is, "string", and even "character," is precisely
what binary I/O does not do: an interpretation of binary data.

I expect, from a proposal for binary I/O, to get the family
of primitives I need to then go and *write* libraries that
handle reading and writing strings and characters.

This is one of the points where there has been a 'castle built
in the air' in R5RS; we assume the ability to read and write
"characters", but have had no way of accessing what actual
binary forms are read or written.  As a result, the few
binary-handling libraries we have (manipulating, for instance,
executable files, graphics files, audio files, etc), all rely
on the implementation using particular encodings and character
formats, which are in no way guaranteed by the standard.

Worse, as implementations move into the wide weird world of
fully supporting Unicode, those vital utilities are becoming
less portable, not more portable.  Worse still, Unicode has
sufficiently complicated the representation of characters (by
having lots of different encodings itself, as well as by
increasing the number of standards that some application a
scheme program needs to interoperate with might be using,
that the simple assumption that we can read and write
"characters" without specifying their binary form has failed
since R5RS was written.

Let me say that again, for emphasis.  The conditions on which
R5RS was predicated have broken.  Our standard is now broken.

R5RS makes sense, sort of, in a world where any environment
could be assumed to have *some* character encoding so
dominant as for character encoding to be a nonissue.  By
leaving such issues unspecified, R5RS left room for different
choices to be made, which would produce sensible systems for
"standalone" use in such environments. That is no longer
the world in which we live.

By failing to specify any means of purely binary I/O, R5RS
left portable scheme unable to cope with a networked world
rich in purely binary formats and a world where characters
from different sources can be encountered in many different
binary formats. And that is the world in which we live now.

This SRFI clearly aims to lay some foundation stones for
building on that will hold in the current world.  In order
to do so, it *MUST* specify rigid, purely binary, I/O.

Character I/O, being subject to interpretation and
reinterpretation, is for a different layer, or, as
Alex says,

>> this is beyond the scope of this SRFI.  A general text
>> parsing library with procedures for reading delimited or terminated
>> strings with an optional size limit would be the right place for this.
>
>That's OK with me, but let's start with such an SRFI right away,
>because A binary i/o srfi without primitives for character strings
>seems to me a littlebit, ah how does one say that in english, "disabled?".

Character strings are not binary.  They are characters.
Binary is data without interpretation.  Characters are
an interpretation of binary data.  These are different
ideas.

We can sweep interpretation like FLOAT32 into "binary" I/O
at this point, but only because of the efforts of the IEEE
which have provided an encoding for such interpretations
that is so universal and dominant as to be a nonissue.
The encoding is sign-bit in the MSB, followed by 8 bits
of exponent, followed by 23 bits of normalized mantissa;
the exponent measures powers of two, not powers of ten or
four or any of the other strange things we had to cope with
fifteen years ago.  The mantissa is normalized base-2, not
BCD.  All that craziness has (mostly) passed away and the
world is a better place for its absence.  But the point is
that if the IEEE standard were not now so universal as to
be a nonissue, floating-point encoding would be a matter
for a separate SRFI, too.

And another point is that if we need to create a different
binary floating-point encoding, for whatever reason, we're
going to look to this SRFI's *BINARY* primitives to give
us the basic tools to implement a way to read and write it.
Encodings can pass out of use, however unlikely it seems;
bits won't.

But character encoding is not, and never has been, universal.
In fact, for the next couple of decades it's looking damned
hairy.  It's way too big and way too unstable to try to build
into a foundation.

Any character I/O SRFI is going to have to be built *ON*
the routines that this SRFI is trying to provide.  It is as
futile to attempt one without first getting purely binary I/O
as it is to attempt to build a castle without first laying
foundation stones.

				Bear