[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are byte ports "ports" as such?

This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.



Thomas Bushnell BSG wrote:
 > I can't tell what you're arguing for.

We *do* have something we can call characters: characters.  You might
find them useless, but their semantics are quite clear.

Maybe in your universe.

Are you arguing which of the following:

1) We should have neither code points nor characters;
2) We should have code points and not characters, and call code points
   something like "code-points";
3) We should have code points and not characters, and call code points something like "characters";
4) We should have both code points and characters, call code points
   something like "characters" and call characters something else.

If you are arguing (1), then fine, let's drop both.  If you are
arguing (3) and (4), there is no defense for your position.

That's very arrogant.  I'm arguing for (3).  Most other programming
languages have chosen this solution, because it works.  I don't know
of any that have implemented "character" (in your sense) as a primitive
data type, so it is up to you to explain how to do it.

What does char->integer return?  How does char<? work?  What is your
proposed implementation for a "character" in the Unicode world, given
that it is not a code-point?  How would you store characters in a
string?

Storage is irrelevant.  An implementation would be free to store
characters however it wished.  char->integer and char<? can return
whatever the implementation pleases.  I would rather drop them, since
they have nothing really to do with characters.  They are functions on
*code points*, which are there because the R5RS authors did not bother
to distinguish code points from characters.

I'm asking how *you* would implement a "character" data type.
Assume you have 32-bit "scheme values".  Would you make characters
immediate/unboxed values?  In that case, assume you have 28 bits.
Or are characters pointers to objects in memory?  If so, how are
they managed?  Are equal characters eq?  Suppose I have a UTF-8
input file.  What does read-char do?  What is a string - an array
of 32-bit Scheme values or could it be more compact?
--
	--Per Bothner
per@bothner.com   http://per.bothner.com/