This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.
Thomas Bushnell BSG wrote:
Right. A text editor needs an input mechanism and a mapping from that mechanism to characters.
Not really. It needs a mapping from input events to actions. Some of those actions may be to insert *strings* into the buffer, or to append string to a search string.
Except that text is an assemblage of characters, not of code points. The editor needs functions like "display this character",
No, it needs functions like "display this string".
"move to next character",
Yes, but this is no different from "move to next word". It doesn't need to work on the character except as *part of the buffer*.
"tell the user what character this is",
Why? Not many editors provide this, and in any case it's only for advanced users.
and even "convert this character to some standard interchange format".
No, it needs "convert this string/buffer to some standard interchange format".
What I want is a *character* type for a text editor.
What you want and what you need are not the same thing. Somebody who uses a text editor does not need characters; they need strings. When you implement a text editor, characters can be useful, but having them as a separate data type is just pointless overhead. You could implement a character using an interned type, like symbols, and then implementing a string as a vector a (pointers to) symbols. But I'm hoping you're not actually proposing this as a good implementation strategy for a text editor - or programming language. However, I totally fail to guess at what you are proposing. > What is *certainly* useless is a "code point" type. They're useless - except for implementing strings and buffers.
A text editor need not deal with encodings *at all*. Think of it: the keyboard driver provides characters to the text editor. Real, full-fledged, characters.
No it doesn't - it provides keystroke events. An "input method" provides strings in general, not individual characters, since it may need to do word lookup.
And the text editor asks the display widget to display a character in a particular font and context (since some characters have different glyphs).
As I said: The display widget works with *strings*, not characters.
But never does it really care about encodings.
Of course it does. Fonts are indexed by code-point.
Think of it this way: an editor should not even *care* what the underlying encodings are for characters; it should be entirelyirrelevant.
Well, at some point you're going to have to ask "is this a digit" or "is this a space". To do that correctly and portably, you need to index the Unicode tables, which are indexed by code-points. Of course that is rather low-level: instead, I'm arguing for an api like "is the character at this position in this string/buffer white-space". This is a special case of the more general: "does the substring after this position match this regular expression". Anyway, this is all irrelevant. Until you specify an actual "character API" and propose a practical implementation strategy, then I think the discussion is pointless. -- --Per Bothner per@bothner.com http://per.bothner.com/