[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

on waste-of-time arguments....



John Cowan's misunderstanding is critical (and TB's reply is well
wide of the mark).  John wrote:

> SRFI-75 in no way prevents that.  It simply says what string<? and
> its friends mean.  You can still provide string-uca-simple<? and
> string-uca-locale<? if you want.

No, S-75 does not "simply" do any such thing.  Unfortunately.
There are logical entanglements within the standard that mean
you can't tweak the definition of string functions without
simultaneously tweaking things like surface syntax.

The code-point-wise and CaseMapping.txt-wise character and string
functions are perfectly well defined --- perfect additions to the
Scheme diamond.   They are low level, true.  They aren't the best way
to process natural language Unicode text, sure.  But they are also 
foundational to perfect Unicode support.   That much is just fine
and John is right about that.

The problem that T.B. didn't call out where he should have there
(though I'm pretty sure he knows about it) is that the character
and string primitives are not really orthogonal to either the 
surface syntax of the language or to communication between scheme
programs using the standard READ and WRITE procedures.

To be more explicit: 

a) The definitions of the character and string functions must be
   consistent with the surface syntax of the language.  (The 
   language in the standard is a little weaselly on this point
   but that is the simplest interpretation and the one most consistent
   with Scheme's heritage of meta-circular programming techniques.)

   Therefore, if the character and string functions are "crude"
   with respect to natural language, then an implementation 
   *can not* (cleanly, simply) allow identifier names which are
   globally-natural-language-friendly except in a crude way.

   We should have the goal of implementations which are not culturally
   biased -- implementations which support all languages equally or 
   at least certain non-english languages perfectly.   If we 
   force identifiers to be crude in that way today, we can not achieve
   the larger goal tomorrow without breaking things.

b) An analogous argument applies to the streams emitted and consumed
   by READ and WRITE.   (This isn't *really* a separate point from (a)
   but people commonly treat it that way.)


It's the surface syntax, not any vague notion of what is "encouraged" 
or what is the right name for this or that function that is in peril.
It doesn't matter that programs using the standard string procedures
currently specified aren't right for some natural language applications
-- as John says, that's what new functions are for.   The only problem
with s-75 is the (one hopes unintended) implications it has for all
future upward compatible syntaxes (for code and data, if you regard
them as separate).

-t