[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Strings/chars

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

On Tue, 23 Dec 2003, Shiro Kawai wrote:

>From: Michael Sperber <sperber@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
>Subject: Re: Strings/chars
>Date: Tue, 23 Dec 2003 11:56:07 +0100
>> What's your take on combining characters?
>I don't have clear idea at the application level, and can only
>imagine that we need several layers.   As Tom Lord mentioned,
>eventually we'd have such layers, and R5RS character would fade
>away in long long term.
>Bear's appoach (as far as I understand, each "character" consists
>of base character + zero or more combining characters; correct me
>if I'm wrong) looks suitable for most linguistic text processing.

That's the basic application I had in mind.  Right now there are
some weird things in the implementation that I don't really know
how to address, such as a combining character, by itself, can
be written using (write) - it comes out #\Uxxxx - but it
can't be (display)ed.  In some sense it's a pseudocharacter,
like control characters and etc, not fully a glyph on its own.
But it's legit enough that most character-type primitives
(char=?, char<? char>? predicates, etc) need to be able to
work on it.

>An application may need more data per character, such as
>how it is represented in the original data, or which language
>it belongs to---it's application dependent, so if we ever want
>to expose it to C FFI, a "character" wouldn't just map to an
>integer; instead, it would be an opaque object with full of APIs
>to extract various information.

Hmmm.  That's probably true.  I'd been pushing the "codepoint"
thing hard, but the more we get into characters, the more they
turn into moderately complicated little bundles.