[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: finalize or withdraw?

[I added a cc to srfi-68.  Followups shuld probably go there,
since I'm primarily discussing SRFI 68.]

Michael Sperber wrote:
> I don't quite understand what you mean here---it's true that you
> probably can't use the underlying abstractions for text I/O, but you
> certainly can perform text I/O using the facilities in SRFI 68,
> building on the underlying binary I/O.  Trying to build a
> multi-encoding text I/O system that's magically compatibly with what
> the common platforms have (i.e. the common implementations of wchar,
> .NET, Java etc.), and still functionally desirable is hard, and I have
> trouble seeing the benefits.

The benefit is for the implementors: If you specify ports that can
arbitrarily mix text and binary then implementors can no longer
use common abstractions and existing libraries.  An implementor
can no longer use the existing APIs for "character ports", but
mus instead use "binary ports" and do their own character->binary
mapping.  True, this isn't very difficult given that SRFI 68 only
directly supports UTF-8, but there is still a type mismatch problem
between Scheme ports (implemented using native binary ports) and
native character ports: E.g. I cannot pass a Scheme port to a Java
method expecting a Reader/Writer or vice versa.

To support other encodings SRFI 68 introduced translators and/or
transcoder, but I find the information on these a bit sparse.  (I
haven't read the specification carefully, though.)  They seem to be
binary-to-binary translators.  Handling a file containing a mix of
binary and non-UTF8 text seems difficult, which negates some of the
point of being able to mix binary and text.

Implementing a translator may be difficult.  For example, while
Java has had general support for text ports with multiple
encoding, it's only relatively recently (JDK 1.4) the
translation machinery has been directly available.  I think
it is possibly to implemented translated streams without
direct access to the translation service, but it requires
a high-overhead pipeline.

The default encoding of a character port *must* be the
"native" encoding of the user's locale.  I don't see
how anything else can even be seriously considered:
a beginning Scheme programmer should be able to write
a simple program that reads or writes a file without
having to set up translators, or specify an encoding.
SRFI 68 appears to contradict this requirement.  Unless
this is addressed, I think SRFI 68 is a non-starter.
Perhaps in 10 years we can ignore "legacy" environments
that don't use UTF-8, but we're far from there yet.

Note also that some encodings are stateful: the meaning of
a sequence of bytes depends on previous bytes.  This makes
mixing text and binary data fragile.
	--Per Bothner
per@xxxxxxxxxxx   http://per.bothner.com/