This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.
Jonathan S. Shapiro wrote:
On Tue, 2006-05-23 at 11:57 -0700, Per Bothner wrote:What is the use-case for read-char, as you define it? What is the use-case for a "character" data type that is *not* a codepoint data type?We are getting to the jagged edge of what I know about UNICODE,
A little knowledge is a dangerous thing ...
but here is the situation as I understand it. The underlying issue within UNICODE is the existence of the so-called "combining characters". There exist characters that have no single defining codepoint. These exist primarily in Asian languages, for example in the form of multiple code points that together form a single "glyph".
You're using the wrong terminology here, I think, but never mind.
The use case, then, seems self evident: programs that must be aware of these at the code-point level.
You're contradicting yourself: I asked about a use-case for *character* as a separate *data type*. You given no such use-case.
The codepoint==char presumption is simply untrue in some non-western languages.
We know that. However, there is still no need for "character" [in the Unicode sense] as a separate data type: Code that works on compound characters *as a unit* can and should use a string type. Code that needs to look *inside* a compound character, needs to works with codepoints. In Java, "character" is actually a Unicode code-point. This is how it should be in Scheme, though we might want to replace the 16-bit size by a 20-bit size to avoid the complexities of surrogate characters. -- --Per Bothner per@bothner.com http://per.bothner.com/