This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.
Gauche has been supporting binary/character mixed port, as well as multiple character encoding schemes, for several years by now, and I have obtained some experience from it. I agree that mixing them is a semantic mess. The fundamental issue here is that the port is an interface between the Scheme world and the external world, and the external world IS a mess. If you've written an application to process emails, collect documents from web, or search documents in your harddisk, you must know it---in general, you are given a file, and you don't know which encoding it is in until you actually read the content and examine it (you have to try several heuristics, and even need to guess sometime). The choice is either to make ports handle the mess, or to break up the concept of ports into layers and allow mess-handling code to be inserted between them. I understand srfi-68 tried the latter, which is cleaner IMHO, but has an efficiency issue. Gauche chose the former (i.e. the native ports support char/binary mixed I/O, as well as various codec conversion) because of efficiency. As the aim differ among implmentations, I don't like the basic srfi to force an implementor to take a specific implementation strategy. Mandating char/binary mixed port would cause difficulty in some implementations. OTOH, mandating strict char/binary port separation would also cause efficiency issue in some implementations (see below for why one would want such mixed ports). So, I like the basic srfi that doesn't mandate, but allows, a char/binary mixed port. The srfi can have a procedure that may create a binary port from a char port, which can be something like this: (char-port->binary-port <char-port> [<encoding>]) => <binary-port> If the implementation supports char/binary mixed port and <encoding> matches its internal encoding, it can return <char-port> itself, avoiding the overhead. One use case of such mix is to read a document with the following characteristics: you know the beginning of the document consists of ASCII characters, which might contain an explicit specification of the character set of the following content. If the beginning part doesn't contain character set spec, you use some default encoding which is an extension of ASCII. Large number of documents out there falls into this category: all XML and HTML documents, for example (as of HTML, I mean the line like <meta http-equiv="content-type" content="text/html; charset=euc-jp">) Gauche also allows the Scheme program source contains charset spec in the comment near the beginning of the program. If you can mix char/binary port, you can create a port with a default encoding, read the beginning part. If it doesn't have an explicit spec, presumably which is a typical case, you can just use the port without overhead; if it has an explicit char-set spec, you can insert some codec which reads the rest of the input as binary. --shiro