[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A different approach

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.




[I'm going by the recent summary of discussion so far and some of the subsequent posts.  I apologize if I'm reraising an already discarded idea.]

This seems to be the main problem (from the Issues section):
The extension of symbol syntax to include all non-whitespace characters above Unicode 127 may be too liberal.

Considering the goals of the editors (portability, progress, enabling and encouraging experimentation), here is an alternative:

1. Leave identifier syntax unmodified.
    Implementors who want Unicode identifier and symbol names can add name mangling to their parsers and pretty-printers.

2. Require that integer->char is total for all assigned unicode code points < 2^16
    8-bit-char-only implementations can treat an attempt to read or create a large character as a resource-exhaustion event.

3. Yes, define the character case mapping functions in terms of CaseMappings.txt
     That is a well balanced choice -- even fancy implementations need that functionality and even simple implementations can afford to provide it.   Meanwhile, many interesting programs can be written using nothing more.

4. Strings must be able to hold any character

5. string->symbol for strings containing codepoints larger than 8-bits is undefined
    It has never been great style in Scheme, even if strictly portable, to write programs which assume that string->symbol and symbol->string define a 1:1 relationship between the two types.

6. the behavior of read is undefined it it encounters a non-8-bit character
and
7. write will never emit a non-8-bit character
    Because just as any portable program should run correctly and usefully in any sufficient standard implementation, if any two programs running in two different implementations are communicating -- the aggregate should still run correctly and usefully.


I believe that that solution would enable portable unicode programs, set a nice stage for the first portable Scheme unicode-text-manipulation libraries, be easy for many implementors to achieve, are a good foundation for an eventual support of Unicode-enabled identifiers and, finally, would answer the objections people have been bringing up about the draft recently (that it somehow precludes an implementor from making very nice Unicode support).


-t