[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues with Unicode

This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.



A few nits to an otherwise well-reasoned argument:

Jonathan S. Shapiro scripsit:

> A note: I'm assuming in all of this that scheme will move to an
> international character set. The problems I am about to discuss do not
> manifest in a system implementing only a 7-bit or 8-bit character set.

But they do manifest quite well in 16-bit and 24-bit national character
sets, so even avoiding Unicode doesn't avoid the problem.

> If we have a resynchronization mechanism, and the implementation chooses
> to provide legacy support for the ISO-10656 legacy character planes,

There's no reason for anyone to do that.  By agreement between ISO
and the Unicode Consortium (both of which have to agree to get any
character into either standard), those planes will never be used
for anything.  (As I mentioned before, I'm willing to back this
judgment with real money.)

> then the implementation must provide up to 11 characters of "push
> back" (6 good ones and 5 bad ones). In the absence of legacy planes, I
> believe (but check me) that 7 characters of push-back is sufficient.

Quite so.

> Something I said earlier deserves stronger emphasis: not all character
> sets have the ability to resynchronize!
> 
> Unicode, I promise you, will not be the last character set in the
> universe. 

No, just the last one on the planet.  :-)

However, the others are far from disappearing, and some of them are
indeed not synchronizable.

> We need to add read-byte, write-byte, and friends, but we should firmly
> segregate character ports and byte ports. Byte ports should NOT support
> object I/O (in the form of READ/WRITE/DISPLAY, nor READ-CHAR). The
> atomic unit of transfer in a byte port should be the byte. The atomic
> unit of transfer in "classic" ports should be the character.

I agree absolutely, and would add:

We need standard procedures that take a byte port and a representation of
an encoding and return a character port.

-- 
John Cowan    cowan@ccil.org    http://ccil.org/~cowan
Objective consideration of contemporary phenomena compel the conclusion
that optimum or inadequate performance in the trend of competitive
activities exhibits no tendency to be commensurate with innate capacity,
but that a considerable element of the unpredictable must invariably be
taken into account. --Ecclesiastes 9:11, Orwell/Brown version