[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues with Unicode

This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.

On Wed, 10 May 2006, John Cowan wrote:

>bear scripsit:
>> Immutable strings - With Unicode and threads, it's the only viable
>> implementation strategy.  [...]  Once you've done the legwork for
>> immutable strings, providing string-set!  and similar is a very short
>> further trip.
>Part of the contract for string-set! is that it mutates its
>first argument.

Right.  The "short further trip" of course is to provide
some kind of string-head which is effectively like
half a cons.  Then string-set! can be implemented by
creating the new string-body using the functional
underpinnings, and updating the string-head to point at
the new string-body.  Now you just use the string-head
as your string representation, and you've provided
string-set!, with its current contract, limiting the
mutation to a single point.

>> Removing string-set! would be way too much of a flag-day for
>> existing scheme code.
>Can't have it both ways.  It will also be a flag day
>to replace string-set! with string-update or some similar
>functional equivalent.

Didn't want it both ways.  String-set!, with unchanged contract,
can be implemented on top of purely functional methods for
manipulating string bodies and an atomic single mutation for
manipulating the string head.

>> Regarding what ought to be legal as an identifier: I think
>> control characters, whitespace (properties Zs, Zl, Zp) and
>> delimiters (properties Ps, and Pc) ought not appear in
>> identifiers.  I wouldn't be at all upset if a standard also
>> forbade combining characters; after all, identifiers and
>> symbol names don't need the full functionality of strings.
>In tht cs thy cnnt be ntrl-lngg trms in an of the lrg set
>of lnggs tht us cmbnng chrctrs fr vwls.

Okay, good point.  And a good reason to allow combining
characters in identifiers, I'm not upset either way.

>> I wouldn't be at all upset of a standard also forbade all
>> characters not yet assigned as of Unicode 4.1.0, with
>> the implication that this forbidding would be permanent
>> across Scheme report revisions, even though later Unicode
>> versions doubtless will come along.
>Which amounts to saying that programmers who use some
>languages get to use meaningful identifiers and others don't.
>That's manifestly unfair.

Hah?  Unicode already encompasses, I believe, every living
language with a writing system.  If you mean that there are
programmers who can't get meaningful identifiers using the
character set defined as of Unicode 4.1.0, I want to know
who those programmers are.

Meanwhile, allowing identifier syntax to shift with every
version of Unicode creates the potential for version