[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section



me:
>> In my view, DISPLAY (in R6RS, not forever) should be undefined in
>> that case (and in all cases where a string contains a 
>> non-8-bit-character) 

John:
> There are no such things as "8-bit characters" per se.  There are a 
> variety of 8-bit encodings that allow up to 256 characters, but they
> are not the same characters in all cases.

[I think your question is related to Shiro's which I intend to
 answer separately but, for now: ]

"Character" is overloaded in this discussion (various Unicode concepts
and the Scheme type).

I'm suggesting a base-level I/O system that consume bits from
a port at some framing, is undefined if the integer value for
those bits is greater than 255, and otherwise returns the
scheme CHAR having that codepoint value.

This is a fairly radical proposal.   It means, for example,
the READ-CHAR will never know squat about UTF-8:  READ-CHAR
is doomed, under my suggestions, to remain forever a low-level
procedure.

On the other hand, it's upward compatible and sets a stage
for experimentation re I/O paradigms.  (Upward compat with
the standard, not implementations -- the divergence being
over how procedures are named, not what they do.)

(I'm fuzzy about how it would make sense to reconcile
the byte-io routines with framing.)

-t