This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.
I think it's important to work well with C as it is, and not try to correct C's problems in the SRFI API. C doesn't specify a particular character set. It distinguishes the character set used in the source code (the "source character set") from the character set used in the running program (the "execution character set"), and specifies a limited set of characters that both must contain (the "basic character set"). The simplest and easiest ways to access Scheme strings' contents and Scheme characters should return the corresponding strings and characters in the C program's execution character set. That is: - Extracting the first character of the Scheme string "z" had better yield the C character 'z'. - I shouldn't have to mention any character set by name, or do any kind of character set conversions at all, to write C code that checks whether a given string's contents are "foo". Implementing this behavior can't be a burden on the Scheme implementation, since it had to get all the data from the outside system anyway, and it was almost certainly already in the program's C execution character set when it arrived. So if the Scheme system doesn't actually use the C execution character set itself, it must already have mechanisms for converting to and from that character set. The next case to support is the "Scheme execution character set", where you just return the data in whatever form is cheapest and easiest for the Scheme system, once you've flattened it out into an array of characters or wide characters. You can't assume any relationship between this form and the C program's execution character set, of course, but you can at least pass it through without paying for conversions you don't need, or wondering if the round trip is going to munge anything. And, when you don't care about writing code portable to other Schemes, you can operate on the data directly. Only after those two cases are covered should one move on to providing ways to reliably get UTF-8, UCS-4, or whatever you like. That's my two cents, anyway.