[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are byte ports "ports" as such?



Per Bothner <per@bothner.com> writes:

>> Except that text is an assemblage of characters, not of code points.
>> The editor needs functions like "display this character",
>
> No, it needs functions like "display this string".

Do you use emacs?  Do you ever use C-x =?

> Yes, but this is no different from "move to next word".  It doesn't
> need to work on the character except as *part of the buffer*.

Right.  And what is most convenient for the editor is to just
increment a pointer.  You want the editor to need to peek inside and
suddenly care about encodings and whatnot, things it otherwise need
not attend to.

Encodings should only matter to the editor when exporting and
importing files.  The rest of the time, the editor should be
encoding-blind. 

>> What I want is a *character* type for a text editor.
>
> What you want and what you need are not the same thing.
> Somebody who uses a text editor does not need characters;
> they need strings.  

Sorry, but I think of a string as an array of characters.  

> When you implement a text editor, characters
> can be useful, but having them as a separate data type is just
> pointless overhead.  

Great, then you don't need characters.  But *certainly* this is not an
argument for taking code points and *calling* them characters.

>> What is *certainly* useless is a "code point" type.
>
> They're useless - except for implementing strings and buffers.

Except that a string is an array of characters.

> Of course it does.  Fonts are indexed by code-point.

No, they are not.  They are indexed by character.  Consider an
accented character that is represented by several code points.  

> Well, at some point you're going to have to ask "is this a digit"
> or "is this a space".  To do that correctly and portably, you need
> to index the Unicode tables, which are indexed by code-points.  Of
> course that is rather low-level: instead, I'm arguing for an api
> like "is the character at this position in this string/buffer
> white-space".  This is a special case of the more general: "does
> the substring after this position match this regular expression".

Except that white-space might not be a regex. ;)

> Anyway, this is all irrelevant.  Until you specify an actual
> "character API" and propose a practical implementation strategy,
> then I think the discussion is pointless.

I see, I think I already had.  Was that missing?  I'm content with the
scheme character API, with the case-related functions fixed or
removed.

Thomas