[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Scheme [was Re: strings draft]

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

On Friday 23 January 2004 07:56 pm, Per Bothner wrote:
> Ken Dickey wrote:
> > It would be a *bad thing* if in going from one locale to another changed
> > a working Scheme program into a broken Scheme program.

> Huh?  What do you mean?  How can a source file containing Scheme
> source code possibly be locale independent?  What if you're on
> a system whose native encoding is EBCDIC?  What if you use
> non-ascii character in string literals or symbols?

I mean that if I write a Scheme program in Germany and move to Turkey that the 
source I READ should continue to have the equivalent behavior given a level 
of support for character sets.  [My computer knows my locale].  I should be 
able to query an implementation to see if an implementation supports a 
particular level of character/string support and write programs that assume 
that level (be it ASCII, Unicode, EBCDIC, whatever).  I should be able to 
write a utility using READ, WRITE, WRITE-CHAR et al which translates between 
character sets [ASCII, Unicode, EBCDIC].  I should be able to write a Scheme 
program whose source is ASCII which deals with Unicode IO.

To "standardize" string and character handling beyond the limited, but very 
useful, subset which Scheme currently has means to me that we need to deal 
with "portability" aspects of "embedded characters in strings" as specified 
by READ.

One of the reasons I tend to do more math in Scheme (or Smalltalk or 
CommonLisp) is that I can use rationals, bignums and complex numbers in a 
relatively abstract an unified way -- in implementations which support them.  

I expect the numeric code I write assuming such numeric support to break in 
systems which don't support numeric types.  I find that such code does work 
as I expect in a large number of implementations.

I am happy to write programs in which identifiers are limited to those 
characters supported today in R5RS.  But I would like to be able to 
manipulate Unicode strings natively -- even if as a separate datatype than 
current strings (I assume conversion/mapping functions).  I am satisfied if 
STRING->SYMBOL signals an error if non-ascii characters are used.

So in the "weak" case, I would support a new, UNICODE-STRING datatype SRFI and 
reasonable set of operations which has well specified interactions with 
strings as currently defined.

I see no reason that this could not be done as a library with little impact on 
R6RS and no need to codify a such a standard prior to a wide experience of 
its consequences.

[Comments?  I Know you have comments!  8^]