[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
> From: Alex Shinn <foof@xxxxxxxxxxxxx>
> Shiro's proposal is well thought out, handles encoding simply,
> and is based on real working practice in Gauche.
> The main complication is that Scheme strings don't necessarily
> have anything to do with C strings. Shared substrings, in fact,
> are not C strings as already acknowledged by the API and strings
> as lists or Boehm cords aren't even consecutive memory
> references. Handle these issues and the only thing left for
> Unicode is to specify the default encoding (and an advanced SRFI
> could specify fetching w/ alternate encodings for efficiency).
My own thinking in this area isn't fully cooked yet but let me make a
few general observations.
* portable FFI vs. native FFI
It's worth keeping clear the difference between an FFI for writing
code portable across multiple implementations vs. an FFi exposing
the full glory of a particular implementation.
In a portable FFI, we can tolerate moderate inefficiences, loss of
generality, and all kinds of sins -- just so long as the result
really is portable and really is enough to write useful code
in a large number of cases.
In terms of strings, I like the idea of ALLOCATE_COPY_OF rather than
EXTRACT: function(s) that give you copies of strings or parts of
strings, in whatever encoding you like (from a small set), but
which don't share state with the actual Scheme string and do have
to be explicitly freed.
That's at least enough to be able to, for example, get the name of a
file you're supposed to open.
* indexes are a total nightmare
Let's suppose a C function wants to hand Scheme the return value of
mb_strlen. Or that Scheme wants to hand C a "string index".
Total train wreck.
* the real problem is C and C libraries
The standard C facilities for large character sets are fairly lame.
The de facto standard practice of using UTF-8 for everything is
limiting. Indeed, there are no standard libraries for things such
as ropes, edit buffers, and so forth.
It's beyond the scope of SRFI-50 but I think that in the longer
term, as we build these next generation Schemes with good Unicode
support, an interesting possibility is to aim for a run-time system
that doubles as a next-generation C library for Unicode text