This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.
Hi Robby, Sorry, but unfortunately I'm missing your point; as it seems to me that the only potential ambiguity that exists by continuing to restrict the composition of scheme code to using a "truly portable" character subset as exists today, is a formal specification of how to express/display characters and strings composed of characters beyond scheme's portable character set. (as scheme programs utilizing portable characters already can be converted/displayed/editable on most known present/future hosts). Which for all practical purposes doesn't seem like much of problem, as scheme already enables the numerical expression/display of arbitrary character/byte encoded values, (which of course would be specific to the character encoding scheme an implementation chose to utilize, which fortunately not specified by the standard; thereby enabling scheme implementations to adopt the assumed character encoding utilized by it's host environment, thereby enabling the assumption of character and raw data byte storage and I/O sequence equivalence; thereby enabling character/byte strings and ports to arbitrarily store and interface with it's environment utilizing any data encoding format that may be required for any arbitrary purpose. (actually a fairly flexible scheme). For the sake of argument, in circumstances where it may be desirable to support the expression of non-portably-displayable encoded extended character-set characters, why not simply define their names spelled within scheme's portable character set; just as #\space is spelled out. i.e.: #\uc:ezet, #\uc:some-chinese-character-name, #\uc:pi, etc. or: (uc 'ezet), (uc 'some-chinese-character-name), (uc 'pi), etc. which I believe have already been named/spelled in unicode's documentation. As otherwise complications arise when one tries to: - specify the character encoding format utilized by scheme, as it then may force characters to translated between scheme's character encoding, and that presumed by the host's environment, thereby preventing the use of scheme's character strings and ports for arbitrarily encoded data, for which scheme presently specifies no alternative facility. - specify the use of a specific extended character-set for both scheme program code and arbitrary character data, which may not easily be unambiguously translated between arbitrary other character-sets, and/or possibly not even displayable or easily editable on arbitrary platforms. Incidentally, while admitting to likely being somewhat both culturally and/or historically biased, I know I have no interest (even if Unicode were ubiquitous), trying to decipher a program composed of mixed Chinese, Japanese, English, French, Greek, Slavic, etc. identifiers and comments; as if programs like this were allowed to be produced, they would be basically unsupportable, and the industry would collapse upon itself, as it already has enough problems trying to maintain code written in a single relatively restricted language and character set, which for good or bad, folks within the computing world have had to become reasonably familiar with, thereby unifying programmers ability to develop, debug, and share common code; otherwise we'll end up with a heterogeneous language code base such that rather than (+ 1 1 1) -> 3, well end up with (+ 1 1 1) -> 1 to no one's true benefit. -paul- > From: Robby Findler <robby@xxxxxxxxxxxxxxx> >> At Thu, 12 Feb 2004 16:23:17 -0500, Paul Schlie wrote: >> As Ken properly pointed out, and which should be abundantly clear to most >> by now; attempting to enable scheme to more conveniently process text >> encoded in an arbitrary character set, is distinctly different than >> attempting to enable scheme to utilize arbitrary characters within its >> program identifier/comment definitions. >> >> While the first is arguably noble, the second would be clearly a mistake. > > I think you're missing one of the real virtues of Scheme (LISP, > originally). As someone has already pointed out, Scheme's data is a > very good representation for Scheme's code. > > Indeed, Schemers can exploit this to tremendous advantage. For example, > imagine you wanted to write a test suite for a macro you had written > and in particular wanted to test that syntax error are raised properly > for bad inputs. In Scheme, this is merely a additional 2 lines in your > testing infrastructure (one to call `expand' and one to catch the > exception). You do not need to step out of the language or start > scripting another instance of your compiler. > > Going even further, consider DrScheme. DrScheme only has one virtual > machine that runs DrScheme itself's code and simultaneously runs the > user's program. Scheme's code as data is one piece of the puzzle that > makes this work so well. > > Robby