[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing Scheme [was Re: strings draft]
On Friday 23 January 2004 07:56 pm, Per Bothner wrote:
> Ken Dickey wrote:
> > It would be a *bad thing* if in going from one locale to another changed
> > a working Scheme program into a broken Scheme program.
> Huh? What do you mean? How can a source file containing Scheme
> source code possibly be locale independent? What if you're on
> a system whose native encoding is EBCDIC? What if you use
> non-ascii character in string literals or symbols?
I mean that if I write a Scheme program in Germany and move to Turkey that the
source I READ should continue to have the equivalent behavior given a level
of support for character sets. [My computer knows my locale]. I should be
able to query an implementation to see if an implementation supports a
particular level of character/string support and write programs that assume
that level (be it ASCII, Unicode, EBCDIC, whatever). I should be able to
write a utility using READ, WRITE, WRITE-CHAR et al which translates between
character sets [ASCII, Unicode, EBCDIC]. I should be able to write a Scheme
program whose source is ASCII which deals with Unicode IO.
To "standardize" string and character handling beyond the limited, but very
useful, subset which Scheme currently has means to me that we need to deal
with "portability" aspects of "embedded characters in strings" as specified
One of the reasons I tend to do more math in Scheme (or Smalltalk or
CommonLisp) is that I can use rationals, bignums and complex numbers in a
relatively abstract an unified way -- in implementations which support them.
I expect the numeric code I write assuming such numeric support to break in
systems which don't support numeric types. I find that such code does work
as I expect in a large number of implementations.
I am happy to write programs in which identifiers are limited to those
characters supported today in R5RS. But I would like to be able to
manipulate Unicode strings natively -- even if as a separate datatype than
current strings (I assume conversion/mapping functions). I am satisfied if
STRING->SYMBOL signals an error if non-ascii characters are used.
So in the "weak" case, I would support a new, UNICODE-STRING datatype SRFI and
reasonable set of operations which has well specified interactions with
strings as currently defined.
I see no reason that this could not be done as a library with little impact on
R6RS and no need to codify a such a standard prior to a wide experience of
[Comments? I Know you have comments! 8^]