[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are byte ports "ports" as such?

This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.



Jonathan S. Shapiro wrote:
On Tue, 2006-05-23 at 11:57 -0700, Per Bothner wrote:
What is the use-case for read-char, as you define it?
What is the use-case for a "character" data type that is
*not* a codepoint data type?

We are getting to the jagged edge of what I know about UNICODE,

A little knowledge is a dangerous thing ...

but here is the situation as I understand it.

The underlying issue within UNICODE is the existence of the so-called
"combining characters". There exist characters that have no single
defining codepoint. These exist primarily in Asian languages, for
example in the form of multiple code points that together form a single
"glyph".

You're using the wrong terminology here, I think, but never mind.

The use case, then, seems self evident: programs that must be aware of
these at the code-point level.

You're contradicting yourself: I asked about a use-case for *character*
as a separate *data type*.

You given no such use-case.

The codepoint==char presumption is simply untrue in some non-western
languages.

We know that.  However, there is still no need for "character" [in the
Unicode sense] as a separate data type:

Code that works on compound characters *as a unit* can and should use a
string type.  Code that needs to look *inside* a compound character,
needs to works with codepoints.

In Java, "character" is actually a Unicode code-point.  This is how it
should be in Scheme, though we might want to replace the 16-bit size
by a 20-bit size to avoid the complexities of surrogate characters.
--
	--Per Bothner
per@bothner.com   http://per.bothner.com/