[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: on waste-of-time arguments....



Thomas Lord scripsit:

> The code-point-wise and CaseMapping.txt-wise character and string
> functions are perfectly well defined --- perfect additions to the
> Scheme diamond.   They are low level, true.  They aren't the best way
> to process natural language Unicode text, sure.  But they are also 
> foundational to perfect Unicode support.   That much is just fine
> and John is right about that.

Thank you.

> a) The definitions of the character and string functions must be
>    consistent with the surface syntax of the language.  (The 
>    language in the standard is a little weaselly on this point
>    but that is the simplest interpretation and the one most consistent
>    with Scheme's heritage of meta-circular programming techniques.)

I entirely agree with this.

>    Therefore, if the character and string functions are "crude"
>    with respect to natural language, then an implementation 
>    *can not* (cleanly, simply) allow identifier names which are
>    globally-natural-language-friendly except in a crude way.

Can you give an example?  I don't understand how this principle applies.
S-75 provides case-{in,}sensitive {character,string} {identity,collation}
functions, and provides syntax for the full scope of Unicode scalar values
as characters and USV sequences as strings.  Furthermore, every character
string can be mapped to a symbol and vice versa (excluding uninterned
symbols, which are not part of the standard).  What is more, identifiers
are explicitly made case-sensitive, so the definition of the string-ci family
no longer affects them.

>    We should have the goal of implementations which are not culturally
>    biased -- implementations which support all languages equally or 
>    at least certain non-english languages perfectly.   If we 
>    force identifiers to be crude in that way today, we can not achieve
>    the larger goal tomorrow without breaking things.

I don't see how we are forcing identifiers to be crude.  We are permitting
distinct identifiers that look exactly alike, yes.  However, if we
allow identifiers other than in Latin script at all, then such spoofs
are always possible; to take only the simplest example, Latin A, Greek Alpha,
and Cyrillic A look exactly alike.

> b) An analogous argument applies to the streams emitted and consumed
>    by READ and WRITE.   (This isn't *really* a separate point from (a)
>    but people commonly treat it that way.)

I don't understand this argument either, alas.

> It's the surface syntax, not any vague notion of what is "encouraged" 
> or what is the right name for this or that function that is in peril.
> It doesn't matter that programs using the standard string procedures
> currently specified aren't right for some natural language applications
> -- as John says, that's what new functions are for.   The only problem
> with s-75 is the (one hopes unintended) implications it has for all
> future upward compatible syntaxes (for code and data, if you regard
> them as separate).

Please spell out these implications (preferably with examples), as I
remain entirely in the dark.

-- 
John Cowan      jcowan@xxxxxxxxxxxxxxxxx        http://www.reutershealth.com
        "Not to know The Smiths is not to know K.X.U."  --K.X.U.