[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

Thomas Lord scripsit:

> Permitting unpaired surrogates does not damage interoperability
> -- programs need only avoid trying to transmit them on channels
> where strictly well-formed UTF-* is called for.  

In fact, it is not ill-formed to have an unpaired surrogate in *any*
UTF encoding; it's just semantically meaningless.

> In my view, DISPLAY (in R6RS, not forever) should be undefined in that
> case (and in all cases where a string contains a non-8-bit-character) --

There are no such things as "8-bit characters" per se.  There are a variety
of 8-bit encodings that allow up to 256 characters, but they are not the
same characters in all cases.

John Cowan    http://www.ccil.org/~cowan   <jcowan@xxxxxxxxxxxxxxxxx>
    "Any legal document draws most of its meaning from context.  A telegram
    that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in
    5-bit Baudot code plus appropriate headers) is as good a legal document
    as any, even sans digital signature." --me