[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues with Unicode



On 23-Apr-06, at 4:54 AM, Jonathan S. Shapiro wrote:

...

3. There is an issue with newline processing in input and output (which
probably is the subject of a different SRFI). Platforms do not agree
about newline conventions in text files. A regrettable consequence is
that character streams require specification at open time as to whether
they are being opened for binary or text processing.

One regrettable consequence of this is that the R5RS specification for
open-output-file and open-input-file is inadequate. A second argument
needs to be added to specify newline processing conventions. Note that
this also became an issue for UNIX STDIO, and that acceptance of "t" and
"b" in the file mode argument to fopen() is now mandated by the C
standard.

This is also an issue for string ports.

In general, any operation that opens a port must specify the desired
processing for newlines.

...

9. Once you have a variable-length character representation, it becomes
necessary to incorporate separate means for reading bytes from input
streams. For example this is needed if the programmer wishes to
construct code to process files in (e.g.) UTF-32. This raises a question
about newline canonicalization. My suggestion is that the port's
handling of newlines should be independent of the caller. That is,
read-byte on a text-mode port that would normally convert the input \r\n to \n should return the byte corresponding to \n. If you want unmangled
bytes, use binary mode input.

The same argument does *not* apply for read-char, because it is the
nature of read-char to process the bytes in order to determine character
length.

For a solution to these problems see SRFI 91. I would appreciate feedback on the SRFI 91 mailing list if you think it does not satisfy your needs.

Marc