[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



me:
>> In my view, DISPLAY (in R6RS, not forever) should be undefined in
>> that case (and in all cases where a string contains a 
>> non-8-bit-character) 

John:
> There are no such things as "8-bit characters" per se.  There are a 
> variety of 8-bit encodings that allow up to 256 characters, but they
> are not the same characters in all cases.

[I think your question is related to Shiro's which I intend to
 answer separately but, for now: ]

"Character" is overloaded in this discussion (various Unicode concepts
and the Scheme type).

I'm suggesting a base-level I/O system that consume bits from
a port at some framing, is undefined if the integer value for
those bits is greater than 255, and otherwise returns the
scheme CHAR having that codepoint value.

This is a fairly radical proposal.   It means, for example,
the READ-CHAR will never know squat about UTF-8:  READ-CHAR
is doomed, under my suggestions, to remain forever a low-level
procedure.

On the other hand, it's upward compatible and sets a stage
for experimentation re I/O paradigms.  (Upward compat with
the standard, not implementations -- the divergence being
over how procedures are named, not what they do.)

(I'm fuzzy about how it would make sense to reconcile
the byte-io routines with framing.)

-t