This page is part of the web mail archives of SRFI 68 from before July 7th, 2015. The new archives for SRFI 68 contain all messages, not just those from before July 7th, 2015.
[I added a cc to srfi-68. Followups shuld probably go there, since I'm primarily discussing SRFI 68.] Michael Sperber wrote: > I don't quite understand what you mean here---it's true that you > probably can't use the underlying abstractions for text I/O, but you > certainly can perform text I/O using the facilities in SRFI 68, > building on the underlying binary I/O. Trying to build a > multi-encoding text I/O system that's magically compatibly with what > the common platforms have (i.e. the common implementations of wchar, > .NET, Java etc.), and still functionally desirable is hard, and I have > trouble seeing the benefits. The benefit is for the implementors: If you specify ports that can arbitrarily mix text and binary then implementors can no longer use common abstractions and existing libraries. An implementor can no longer use the existing APIs for "character ports", but mus instead use "binary ports" and do their own character->binary mapping. True, this isn't very difficult given that SRFI 68 only directly supports UTF-8, but there is still a type mismatch problem between Scheme ports (implemented using native binary ports) and native character ports: E.g. I cannot pass a Scheme port to a Java method expecting a Reader/Writer or vice versa. To support other encodings SRFI 68 introduced translators and/or transcoder, but I find the information on these a bit sparse. (I haven't read the specification carefully, though.) They seem to be binary-to-binary translators. Handling a file containing a mix of binary and non-UTF8 text seems difficult, which negates some of the point of being able to mix binary and text. Implementing a translator may be difficult. For example, while Java has had general support for text ports with multiple encoding, it's only relatively recently (JDK 1.4) the translation machinery has been directly available. I think it is possibly to implemented translated streams without direct access to the translation service, but it requires a high-overhead pipeline. The default encoding of a character port *must* be the "native" encoding of the user's locale. I don't see how anything else can even be seriously considered: a beginning Scheme programmer should be able to write a simple program that reads or writes a file without having to set up translators, or specify an encoding. SRFI 68 appears to contradict this requirement. Unless this is addressed, I think SRFI 68 is a non-starter. Perhaps in 10 years we can ignore "legacy" environments that don't use UTF-8, but we're far from there yet. Note also that some encodings are stateful: the meaning of a sequence of bytes depends on previous bytes. This makes mixing text and binary data fragile. -- --Per Bothner per@xxxxxxxxxxx http://per.bothner.com/