[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Octet vs Char



From: bear <bear@xxxxxxxxx>
Subject: Re: Octet vs Char (Re: strings draft)
Date: Mon, 26 Jan 2004 11:04:13 -0800 (PST)

> On Sun, 25 Jan 2004, Shiro Kawai wrote:
> 
> >I think using strings for binary I/O should be explicitly
> >discouraged, even though an octet sequence can be represented by
> >using such special characters.  It can be very inefficient
> >on some implementations, and it may cause problems on ports
> >that deals with character encodings.
> 
> Hear, Hear.  The standard goes out of its way to *not* assume
> a particular character encoding and repertoire; it follows that
> code relying on a particular character encoding in order to do
> binary I/O is nonportable.

Right.  Since one can't count on a particular mapping between
integers and characters, the portable use of [0..255] mapping
is limited in the way that the programmer treats these characters
"opaque", only assuming she can extract the same integer from
it as the character is created from.

This fact makes me think the use of Scheme character for
C FFI that expects C 'char' not very convenient; it'd be
easier to pass Scheme integers (and you use ascii->char/
char->ascii in Scheme side, if you wish).

Nevertheless, there _might_ be a case that you wish to receive
a character from C FFI, and you don't want to get an error
when C FFI incidentally returns a code that is not supported
by the Scheme implementation.  That's why I think I don't
need to reject the idea.  Such use of 'octet as character'
should be used only to handle exceptional cases, though.

> We should never need to know what the binary encoding of some
> character 'C' is in order to write a .jpg file to disk, but as
> matters stand we do.  It makes no damn sense that a program that
> attempts to write a graphic format or a sound file has to rely
> on ASCII encodings for characters and will fail if run on a
> machine whose character encoding is different -- EBCDIC, or
> utf-8, or utf-16, etc.  This program is not even manipulating
> text; why should character encodings be capable of causing it to
> fail?

Exactly.  That's why I said:

> >I think using strings for binary I/O should be explicitly
> >discouraged, even though an octet sequence can be represented by
> >using such special characters.

--shiro