[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing Scheme [was Re: strings draft]

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

To: Tom Lord <lord@xxxxxxx>
Subject: Re: Parsing Scheme [was Re: strings draft]
From: tb@xxxxxxxxxx (Thomas Bushnell, BSG)
Date: 23 Jan 2004 22:48:21 -0800
Cc: Ken.Dickey@xxxxxxxxxxxxxx, srfi-50@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-50@xxxxxxxxxxxxxxxxx
In-reply-to: <200401232252.OAA27834@xxxxxxxxxxxxxxxxxxxxxxx>
References: <200401220511.VAA18432@xxxxxxxxxxxxxxxxxxxxxxx> <200401230907.27619.Ken.Dickey@xxxxxxxxxxxxxx> <200401232020.MAA26771@xxxxxxxxxxxxxxxxxxxxxxx> <87hdymz7fk.fsf@xxxxxxxxxxxxxxxxx> <200401232252.OAA27834@xxxxxxxxxxxxxxxxxxxxxxx>
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Tom Lord <lord@xxxxxxx> writes:

> CHAR-UPCASE and CHAR-DOWNCASE are mandatory and STRING-CI=? is defined
> in terms of CHAR-CI=?

If you're asking what should be in the next RnRS, then there is no
sense in which CHAR-UPCASE is mandatory.  The editors can choose to
include it, or not.  I am speaking of what I would like the next RnRS
to say, precisely because the current version is entirely unsuitable
for correct character handling.

There *is no* good implementation of R5RS if you want the Scheme
character type to be based upon Unicode.

> In the latter case, CHAR-DOWNCASE behaves in a linguistically odd for
> Turkish speakers because it either converts #\I to #\i or #\I to #\I.

This is not "linguistically odd", it's incorrect.  It is in fact
incorrect in a way which violates the best Unicode practices.  It is
this which I spoke of a while back when I first entered the thread.
If you are saying that it doesn't matter that the R5RS character type
cannot be used with the best Unicode practices, then I disagree
strongly.  

> The character casemappings would still need to be defined to specify
> Scheme.  Reifying that definition into Scheme in the form of those
> procedures is only natural.

Huh?  Why on earth would it?  We could specify scheme and give *no*
case-mapping functions, and instead only specify the output identifier
matching function.  I am coming to believe that it should not be
specified as string-ci=?, in fact, because a-with-accent-grave is not
ci=? to a-without-accent, but a system might sensibly choose to
treat them as equivalent for identifiers.

There should be string-id=? (or some other name) which implements the
Scheme identifier matching rules, which should be specified for the
required character set, and left unspecified for all other
characters.  

None of this requires or even implicitly uses a case mapping function.

> The standard would still need to specify CHAR-DOWNCASE.   

Why?  Is there some government bureau that will shut us down if the
next RnRS eleminates it?

I don't mind STRING-DOWNCASE, of course, which should have a locale
argument and be specified to permit the Correct Unicode Thing.

Thomas

Follow-Ups:
- Re: Parsing Scheme [was Re: strings draft]
  - From: Tom Lord

References:
- strings draft
  - From: Tom Lord
- Parsing Scheme [was Re: strings draft]
  - From: Ken Dickey
- Re: Parsing Scheme [was Re: strings draft]
  - From: Tom Lord
- Re: Parsing Scheme [was Re: strings draft]
  - From: Thomas Bushnell, BSG
- Re: Parsing Scheme [was Re: strings draft]
  - From: Tom Lord

Prev by Date: Re: strings draft
Next by Date: Re: Parsing Scheme [was Re: strings draft]
Previous by thread: Re: Parsing Scheme [was Re: strings draft]
Next by thread: Re: Parsing Scheme [was Re: strings draft]
Index(es):
- Date
- Thread