[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.



Agreed.

But feel compelled to observe that once an object's internal representation
is formatted/encoded to/from whatever external representations form is
desired/required, it is then essentially in binary form; therefore binary
I/O actually represents the root common port format for of all I/O; where
more abstract ports may be thought of as merely munging on the data prior to
sending (or after receiving) it trough a binary port; which although it may
seem like a subtlety, if scheme were to view ports in this hierarchical way,
it could form the basis of a very flexible data transformation and I/O
architecture.

Thanks for your time and patience, -paul-

> From: bear <bear@xxxxxxxxx>
> Date: Fri, 13 Feb 2004 09:40:42 -0800 (PST)
> To: Paul Schlie <schlie@xxxxxxxxxxx>
> Cc: srfi-52@xxxxxxxxxxxxxxxxx
> Subject: Re: Encodings.
> 
> 
> It has long been my opinion that scheme needs binary I/O capabilities
> to be standardized.  Character ports should be distinguished from
> binary ports at a low level, and binary I/O operations should be
> different operations from character I/O operations.
> 
> I'm entirely happy with (read-char) and (write-char) reading "a
> character" (although I'll note that opinions vary on what a character
> is, nuff said) or having (read) or (write) read or write a data object
> in the external representation form which is made of characters.
> 
> Those are character operations, and when dealing strictly with
> character operations, the appropriate place for concerns about
> encoding, endianness, and external canonicalization are below the
> level of the program's notice.  Fold all that stuff into the port code
> for character ports and don't bother the programmer with it.  As far
> as text processing code is concerned, a character is a character is a
> character, or at least that's how it should be.
> 
> That leaves implementors the freedom to implement their character
> ports in terms of whatever abstraction their particular platform uses
> for characters, whether its utf8be or utf32le or ascii or ebcdic or
> latin1 or ISO-foo or iscii or PETscii or the ZX81 charset or some
> other embedded processor weirdness.  This is not a bug, it's a
> feature.  If there are multiple encodings/canonicalizations/etc in use
> on a system, let schemes on those systems implement multiple kinds of
> character ports.
> 
> But it follows that there is NO WAY we should rely on I/O of
> characters through character ports to read or write a particular
> binary representation for "raw" data such as sound and image files.
> Attempting to do so is bad design, because it breaks an abstraction
> barrier and presumes things which are beyond the program's proper
> control or knowledge.
> 
> The only reason programmers want to write characters that aren't in
> the "normal" encoding/canonicalization/etc, is when they need really
> close control of the exact format of I/O.  But when you need control
> *that* close, you're not talking about a "character" port at all any
> more; you're talking about binary I/O.  Rather than breaking the
> abstraction barrier on character ports, you need a different kind of
> port.  We need binary ports that support operations like (read-bytes)
> and (write-bytes).
> 
> It may be needful to read and write "characters" on these ports; but
> character fields inside binary data formats tend to be both very rigid
> and diverse in their encoding/etc, so character operations on binary
> ports, if supported at all, should IMO have mandatory arguments to
> specify their encoding/etc.
> 
> Bear