[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are byte ports "ports" as such?



Jonathan S. Shapiro wrote:
On Tue, 2006-05-23 at 11:57 -0700, Per Bothner wrote:
What is the use-case for read-char, as you define it?
What is the use-case for a "character" data type that is
*not* a codepoint data type?

We are getting to the jagged edge of what I know about UNICODE,

A little knowledge is a dangerous thing ...

but here is the situation as I understand it.

The underlying issue within UNICODE is the existence of the so-called
"combining characters". There exist characters that have no single
defining codepoint. These exist primarily in Asian languages, for
example in the form of multiple code points that together form a single
"glyph".

You're using the wrong terminology here, I think, but never mind.

The use case, then, seems self evident: programs that must be aware of
these at the code-point level.

You're contradicting yourself: I asked about a use-case for *character*
as a separate *data type*.

You given no such use-case.

The codepoint==char presumption is simply untrue in some non-western
languages.

We know that.  However, there is still no need for "character" [in the
Unicode sense] as a separate data type:

Code that works on compound characters *as a unit* can and should use a
string type.  Code that needs to look *inside* a compound character,
needs to works with codepoints.

In Java, "character" is actually a Unicode code-point.  This is how it
should be in Scheme, though we might want to replace the 16-bit size
by a 20-bit size to avoid the complexities of surrogate characters.
--
	--Per Bothner
per@bothner.com   http://per.bothner.com/