[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues with Unicode

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: "Jonathan S. Shapiro" <shap@xxxxxxxxxxx>
Subject: Re: Issues with Unicode
From: Marc Feeley <feeley@xxxxxxxxxxxxxxxx>
Date: Sun, 23 Apr 2006 09:25:56 -0400
Cc: srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <y9lbqushdsw.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <y9lbqushdsw.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx>

On 23-Apr-06, at 4:54 AM, Jonathan S. Shapiro wrote:

...
3. There is an issue with newline processing in input and output(which
probably is the subject of a different SRFI). Platforms do not agree
about newline conventions in text files. A regrettable consequence is
that character streams require specification at open time as towhether
they are being opened for binary or text processing.

One regrettable consequence of this is that the R5RS specification for
open-output-file and open-input-file is inadequate. A second argument
needs to be added to specify newline processing conventions. Note that
this also became an issue for UNIX STDIO, and that acceptance of"t" and
"b" in the file mode argument to fopen() is now mandated by the C
standard.

This is also an issue for string ports.

In general, any operation that opens a port must specify the desired
processing for newlines.

...
9. Once you have a variable-length character representation, itbecomes
necessary to incorporate separate means for reading bytes from input
streams. For example this is needed if the programmer wishes to
construct code to process files in (e.g.) UTF-32. This raises aquestion
about newline canonicalization. My suggestion is that the port's
handling of newlines should be independent of the caller. That is,
read-byte on a text-mode port that would normally convert the input\r\nto \n should return the byte corresponding to \n. If you wantunmangled
bytes, use binary mode input.

The same argument does *not* apply for read-char, because it is the
nature of read-char to process the bytes in order to determinecharacter
length.

For a solution to these problems see SRFI 91. I would appreciatefeedback on the SRFI 91 mailing list if you think it does not satisfyyour needs.


Marc

References:
- Issues with Unicode
  - From: Jonathan S. Shapiro

Prev by Date: Issues with Unicode
Next by Date: Re: Issues with Unicode
Previous by thread: Issues with Unicode
Next by thread: Re: Issues with Unicode
Index(es):
- Date
- Thread