This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.
> From: Alex Shinn <foof@xxxxxxxxxxxxx> > Shiro's proposal is well thought out, handles encoding simply, > and is based on real working practice in Gauche. > The main complication is that Scheme strings don't necessarily > have anything to do with C strings. Shared substrings, in fact, > are not C strings as already acknowledged by the API and strings > as lists or Boehm cords aren't even consecutive memory > references. Handle these issues and the only thing left for > Unicode is to specify the default encoding (and an advanced SRFI > could specify fetching w/ alternate encodings for efficiency). My own thinking in this area isn't fully cooked yet but let me make a few general observations. * portable FFI vs. native FFI It's worth keeping clear the difference between an FFI for writing code portable across multiple implementations vs. an FFi exposing the full glory of a particular implementation. In a portable FFI, we can tolerate moderate inefficiences, loss of generality, and all kinds of sins -- just so long as the result really is portable and really is enough to write useful code in a large number of cases. In terms of strings, I like the idea of ALLOCATE_COPY_OF rather than EXTRACT: function(s) that give you copies of strings or parts of strings, in whatever encoding you like (from a small set), but which don't share state with the actual Scheme string and do have to be explicitly freed. That's at least enough to be able to, for example, get the name of a file you're supposed to open. * indexes are a total nightmare Let's suppose a C function wants to hand Scheme the return value of mb_strlen. Or that Scheme wants to hand C a "string index". Total train wreck. * the real problem is C and C libraries The standard C facilities for large character sets are fairly lame. The de facto standard practice of using UTF-8 for everything is limiting. Indeed, there are no standard libraries for things such as ropes, edit buffers, and so forth. It's beyond the scope of SRFI-50 but I think that in the longer term, as we build these next generation Schemes with good Unicode support, an interesting possibility is to aim for a run-time system that doubles as a next-generation C library for Unicode text manipulation. -t