[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Issues with Unicode
On Wed, 10 May 2006, John Cowan wrote:
>> Immutable strings - With Unicode and threads, it's the only viable
>> implementation strategy. [...] Once you've done the legwork for
>> immutable strings, providing string-set! and similar is a very short
>> further trip.
>Part of the contract for string-set! is that it mutates its
Right. The "short further trip" of course is to provide
some kind of string-head which is effectively like
half a cons. Then string-set! can be implemented by
creating the new string-body using the functional
underpinnings, and updating the string-head to point at
the new string-body. Now you just use the string-head
as your string representation, and you've provided
string-set!, with its current contract, limiting the
mutation to a single point.
>> Removing string-set! would be way too much of a flag-day for
>> existing scheme code.
>Can't have it both ways. It will also be a flag day
>to replace string-set! with string-update or some similar
Didn't want it both ways. String-set!, with unchanged contract,
can be implemented on top of purely functional methods for
manipulating string bodies and an atomic single mutation for
manipulating the string head.
>> Regarding what ought to be legal as an identifier: I think
>> control characters, whitespace (properties Zs, Zl, Zp) and
>> delimiters (properties Ps, and Pc) ought not appear in
>> identifiers. I wouldn't be at all upset if a standard also
>> forbade combining characters; after all, identifiers and
>> symbol names don't need the full functionality of strings.
>In tht cs thy cnnt be ntrl-lngg trms in an of the lrg set
>of lnggs tht us cmbnng chrctrs fr vwls.
Okay, good point. And a good reason to allow combining
characters in identifiers, I'm not upset either way.
>> I wouldn't be at all upset of a standard also forbade all
>> characters not yet assigned as of Unicode 4.1.0, with
>> the implication that this forbidding would be permanent
>> across Scheme report revisions, even though later Unicode
>> versions doubtless will come along.
>Which amounts to saying that programmers who use some
>languages get to use meaningful identifiers and others don't.
>That's manifestly unfair.
Hah? Unicode already encompasses, I believe, every living
language with a writing system. If you mean that there are
programmers who can't get meaningful identifiers using the
character set defined as of Unicode 4.1.0, I want to know
who those programmers are.
Meanwhile, allowing identifier syntax to shift with every
version of Unicode creates the potential for version