[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.





    > From: Ken Dickey <Ken.Dickey@xxxxxxxxxxxxxx>

    > On Thursday 12 February 2004 06:45 pm, bear wrote:
    > > > Defining valid identifier syntax such that case folding of
    > > > (unnormalized) identifier literals should be sufficient.

    > > > What am I missing?

    > > You're missing all the tools and utilities out there that are
    > > programmed with the expectation and requirement that they can
    > > arbitrarily impose or change normalization forms without changing the
    > > text of the documents they handle.  There is no escaping this; even
    > > Emacs and Notepad do it.

    > Ah!  So a broken language (huge tables and complex processing) must be defined 
    > to deal with broken tools which do not write out Unicode data in a canonical 
    > format.

    > What about creating a tool which reads bizarre Unicode and writes it out in a 
    > canonical format?  Then requiring portable Scheme programs to pass through 
    > it?  

    > Sounds like a service to the entire Unicode community.  It could be written in 
    > portable Scheme and serve as a (presumably good) advertisement for the 
    > language.

    > Don't complexify the implementation, simplify the problem!

There's a distinction and separation-of-concerns to make here.  And
there's some compiler-perspective bigotry to undo.  Finally, let me
try to give a new perspective on my cumulative project here.

First: let's not forget that SRFI-52 most explicitly does _not_
require _any_ degree of Unicode support from implementations.  The
_only_ thing it does is to tweak the language spec in some minor ways
that are needed so that the R6RS doesn't _preclude_ a conforming
implementation from supporting Unicode.   Much of the discussion that
has taken place during my absense is not really focused on SRFI-52
issues -- but on issues raised in the "preview proto-SRFIs" that I've
published at the same time.

(It's _fine_ (good even) to host that discussion here.  Very
appropriate.  But let's not conflate the proposals of those other
proto-SRFIs with the very conservative content of (real-)SRFI-52.)

Second: it's just not realistic to punt the complexities of Unicode by
saying that Scheme code needs to pass through a canonicalizing filter.
There's the question of READ and it's correlates -- consideration of
source code only is not sufficient.  S-expressions have to grow up to
be a real exchange format or else Scheme (and lisp generally) sucks.

Third -- the project here:  R6RS is not going to be "Unicode Scheme",
in my opinion.   Nor should any R^NRS for any value of N.  There ought
to be a "Unicode Scheme Standard" -- to facilitate both data and code
exchange -- but it should be layered.   Human language is not
essential to computing: not a topic for R^NRS, ever.

(A small subset of ASCII, on the other hand, is "of the essence" :-)

-t

p.s.: it is naive to believe that the Unicode community is suffering
for the lack of canonicalization filters.   At the same time, it is a
healthy example of "philology recapitulates..." that we've arrived at
wondering if and how we want one in this context.