[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.



Agreed.

But feel compelled to observe that once an object's internal representation
is formatted/encoded to/from whatever external representations form is
desired/required, it is then essentially in binary form; therefore binary
I/O actually represents the root common port format for of all I/O; where
more abstract ports may be thought of as merely munging on the data prior to
sending (or after receiving) it trough a binary port; which although it may
seem like a subtlety, if scheme were to view ports in this hierarchical way,
it could form the basis of a very flexible data transformation and I/O
architecture.

Thanks for your time and patience, -paul-

> From: bear <bear@xxxxxxxxx>
> Date: Fri, 13 Feb 2004 09:40:42 -0800 (PST)
> To: Paul Schlie <schlie@xxxxxxxxxxx>
> Cc: srfi-52@xxxxxxxxxxxxxxxxx
> Subject: Re: Encodings.
> 
> 
> It has long been my opinion that scheme needs binary I/O capabilities
> to be standardized.  Character ports should be distinguished from
> binary ports at a low level, and binary I/O operations should be
> different operations from character I/O operations.
> 
> I'm entirely happy with (read-char) and (write-char) reading "a
> character" (although I'll note that opinions vary on what a character
> is, nuff said) or having (read) or (write) read or write a data object
> in the external representation form which is made of characters.
> 
> Those are character operations, and when dealing strictly with
> character operations, the appropriate place for concerns about
> encoding, endianness, and external canonicalization are below the
> level of the program's notice.  Fold all that stuff into the port code
> for character ports and don't bother the programmer with it.  As far
> as text processing code is concerned, a character is a character is a
> character, or at least that's how it should be.
> 
> That leaves implementors the freedom to implement their character
> ports in terms of whatever abstraction their particular platform uses
> for characters, whether its utf8be or utf32le or ascii or ebcdic or
> latin1 or ISO-foo or iscii or PETscii or the ZX81 charset or some
> other embedded processor weirdness.  This is not a bug, it's a
> feature.  If there are multiple encodings/canonicalizations/etc in use
> on a system, let schemes on those systems implement multiple kinds of
> character ports.
> 
> But it follows that there is NO WAY we should rely on I/O of
> characters through character ports to read or write a particular
> binary representation for "raw" data such as sound and image files.
> Attempting to do so is bad design, because it breaks an abstraction
> barrier and presumes things which are beyond the program's proper
> control or knowledge.
> 
> The only reason programmers want to write characters that aren't in
> the "normal" encoding/canonicalization/etc, is when they need really
> close control of the exact format of I/O.  But when you need control
> *that* close, you're not talking about a "character" port at all any
> more; you're talking about binary I/O.  Rather than breaking the
> abstraction barrier on character ports, you need a different kind of
> port.  We need binary ports that support operations like (read-bytes)
> and (write-bytes).
> 
> It may be needful to read and write "characters" on these ports; but
> character fields inside binary data formats tend to be both very rigid
> and diverse in their encoding/etc, so character operations on binary
> ports, if supported at all, should IMO have mandatory arguments to
> specify their encoding/etc.
> 
> Bear