[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Why are byte ports "ports" as such?
Jonathan S. Shapiro wrote:
On Tue, 2006-05-23 at 11:57 -0700, Per Bothner wrote:
What is the use-case for read-char, as you define it?
What is the use-case for a "character" data type that is
*not* a codepoint data type?
We are getting to the jagged edge of what I know about UNICODE,
A little knowledge is a dangerous thing ...
but here is the situation as I understand it.
The underlying issue within UNICODE is the existence of the so-called
"combining characters". There exist characters that have no single
defining codepoint. These exist primarily in Asian languages, for
example in the form of multiple code points that together form a single
You're using the wrong terminology here, I think, but never mind.
The use case, then, seems self evident: programs that must be aware of
these at the code-point level.
You're contradicting yourself: I asked about a use-case for *character*
as a separate *data type*.
You given no such use-case.
The codepoint==char presumption is simply untrue in some non-western
We know that. However, there is still no need for "character" [in the
Unicode sense] as a separate data type:
Code that works on compound characters *as a unit* can and should use a
string type. Code that needs to look *inside* a compound character,
needs to works with codepoints.
In Java, "character" is actually a Unicode code-point. This is how it
should be in Scheme, though we might want to replace the 16-bit size
by a 20-bit size to avoid the complexities of surrogate characters.