[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are byte ports "ports" as such?



Thomas Bushnell BSG wrote:
Right.  A text editor needs an input mechanism and a mapping from that
mechanism to characters.

Not really.  It needs a mapping from input events to actions.  Some
of those actions may be to insert *strings* into the buffer, or to
append string to a search string.

Except that text is an assemblage of characters, not of code points.
The editor needs functions like "display this character",

No, it needs functions like "display this string".

"move to next character",

Yes, but this is no different from "move to next word".  It doesn't
need to work on the character except as *part of the buffer*.

"tell the user what character this is",

Why?  Not many editors provide this, and in any case it's only
for advanced users.

and even "convert this character to some standard interchange format".

No, it needs "convert this string/buffer to some standard interchange
format".

What I want is a *character* type for a text editor.

What you want and what you need are not the same thing.
Somebody who uses a text editor does not need characters;
they need strings.  When you implement a text editor, characters
can be useful, but having them as a separate data type is just
pointless overhead.  You could implement a character using an
interned type, like symbols, and then implementing a string as
a vector a (pointers to) symbols.  But I'm hoping you're not
actually proposing this as a good implementation strategy for
a text editor - or programming language.  However, I totally fail
to guess at what you are proposing.

> What is *certainly* useless is a "code point" type.

They're useless - except for implementing strings and buffers.

A text editor need not deal with encodings *at all*.  Think of it: the
keyboard driver provides characters to the text editor.  Real,
full-fledged, characters.

No it doesn't - it provides keystroke events.  An "input method"
provides strings in general, not individual characters, since
it may need to do word lookup.

And the text editor asks the display widget
to display a character in a particular font and context (since some
characters have different glyphs).

As I said: The display widget works with *strings*, not characters.

But never does it really care about encodings.

Of course it does.  Fonts are indexed by code-point.

Think of it this way: an editor should not even *care* what the
underlying encodings are for characters; it should be entirely
irrelevant.

Well, at some point you're going to have to ask "is this a digit"
or "is this a space".  To do that correctly and portably, you need
to index the Unicode tables, which are indexed by code-points.  Of
course that is rather low-level: instead, I'm arguing for an api
like "is the character at this position in this string/buffer
white-space".  This is a special case of the more general: "does
the substring after this position match this regular expression".

Anyway, this is all irrelevant.  Until you specify an actual
"character API" and propose a practical implementation strategy,
then I think the discussion is pointless.
--
	--Per Bothner
per@bothner.com   http://per.bothner.com/