This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.
> From: Ken Dickey <Ken.Dickey@xxxxxxxxxxxxxx> > On Thursday 12 February 2004 06:45 pm, bear wrote: > > > Defining valid identifier syntax such that case folding of > > > (unnormalized) identifier literals should be sufficient. > > > What am I missing? > > You're missing all the tools and utilities out there that are > > programmed with the expectation and requirement that they can > > arbitrarily impose or change normalization forms without changing the > > text of the documents they handle. There is no escaping this; even > > Emacs and Notepad do it. > Ah! So a broken language (huge tables and complex processing) must be defined > to deal with broken tools which do not write out Unicode data in a canonical > format. > What about creating a tool which reads bizarre Unicode and writes it out in a > canonical format? Then requiring portable Scheme programs to pass through > it? > Sounds like a service to the entire Unicode community. It could be written in > portable Scheme and serve as a (presumably good) advertisement for the > language. > Don't complexify the implementation, simplify the problem! There's a distinction and separation-of-concerns to make here. And there's some compiler-perspective bigotry to undo. Finally, let me try to give a new perspective on my cumulative project here. First: let's not forget that SRFI-52 most explicitly does _not_ require _any_ degree of Unicode support from implementations. The _only_ thing it does is to tweak the language spec in some minor ways that are needed so that the R6RS doesn't _preclude_ a conforming implementation from supporting Unicode. Much of the discussion that has taken place during my absense is not really focused on SRFI-52 issues -- but on issues raised in the "preview proto-SRFIs" that I've published at the same time. (It's _fine_ (good even) to host that discussion here. Very appropriate. But let's not conflate the proposals of those other proto-SRFIs with the very conservative content of (real-)SRFI-52.) Second: it's just not realistic to punt the complexities of Unicode by saying that Scheme code needs to pass through a canonicalizing filter. There's the question of READ and it's correlates -- consideration of source code only is not sufficient. S-expressions have to grow up to be a real exchange format or else Scheme (and lisp generally) sucks. Third -- the project here: R6RS is not going to be "Unicode Scheme", in my opinion. Nor should any R^NRS for any value of N. There ought to be a "Unicode Scheme Standard" -- to facilitate both data and code exchange -- but it should be layered. Human language is not essential to computing: not a topic for R^NRS, ever. (A small subset of ASCII, on the other hand, is "of the essence" :-) -t p.s.: it is naive to believe that the Unicode community is suffering for the lack of canonicalization filters. At the same time, it is a healthy example of "philology recapitulates..." that we've arrived at wondering if and how we want one in this context.