This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.
On Friday 23 January 2004 07:56 pm, Per Bothner wrote: > Ken Dickey wrote: > > It would be a *bad thing* if in going from one locale to another changed > > a working Scheme program into a broken Scheme program. > Huh? What do you mean? How can a source file containing Scheme > source code possibly be locale independent? What if you're on > a system whose native encoding is EBCDIC? What if you use > non-ascii character in string literals or symbols? I mean that if I write a Scheme program in Germany and move to Turkey that the source I READ should continue to have the equivalent behavior given a level of support for character sets. [My computer knows my locale]. I should be able to query an implementation to see if an implementation supports a particular level of character/string support and write programs that assume that level (be it ASCII, Unicode, EBCDIC, whatever). I should be able to write a utility using READ, WRITE, WRITE-CHAR et al which translates between character sets [ASCII, Unicode, EBCDIC]. I should be able to write a Scheme program whose source is ASCII which deals with Unicode IO. To "standardize" string and character handling beyond the limited, but very useful, subset which Scheme currently has means to me that we need to deal with "portability" aspects of "embedded characters in strings" as specified by READ. One of the reasons I tend to do more math in Scheme (or Smalltalk or CommonLisp) is that I can use rationals, bignums and complex numbers in a relatively abstract an unified way -- in implementations which support them. I expect the numeric code I write assuming such numeric support to break in systems which don't support numeric types. I find that such code does work as I expect in a large number of implementations. I am happy to write programs in which identifiers are limited to those characters supported today in R5RS. But I would like to be able to manipulate Unicode strings natively -- even if as a separate datatype than current strings (I assume conversion/mapping functions). I am satisfied if STRING->SYMBOL signals an error if non-ascii characters are used. So in the "weak" case, I would support a new, UNICODE-STRING datatype SRFI and reasonable set of operations which has well specified interactions with strings as currently defined. I see no reason that this could not be done as a library with little impact on R6RS and no need to codify a such a standard prior to a wide experience of its consequences. [Comments? I Know you have comments! 8^] Cheers, -KenD