This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.
On Wed, 11 Feb 2004, Bradd W. Szonye wrote: >> At Tue, 10 Feb 2004 13:06:28 -0800 (PST), Tom Lord wrote: >>> There is an easy example of why such a category is desirable in >>> computing. Let's suppose that I'm going to specify the lexical >>> syntax of identifiers in a programming language. As part of that >>> specification, I'll need to identify this category. (For an example, >>> see "Unicode Technical Report #31: Identifier and Pattern Syntax", >>> http://www.unicode.org/reports/tr31/tr31-2.html) > >Alex Shinn wrote: >> We may want to take that report with a grain of salt for Scheme. A >> simpler approach would be to define Scheme identifiers as everything >> _excluding_ the reserved punctuation characters, optionally allowing >> Unicode variations on those characters and extending the definition of >> whitespace. Most Schemes already work in this manner, despite the >> fact that R5RS uses an inclusive list .... > >Agreed. It has the same basic flaw as Annex 7 of UTR 15: It isn't a >syntax for programming-language identifiers, it's a syntax for C-family >identifiers! Both reports blithely ignore the fact that not all Agreed. There are some appropriate restrictions, I think; identifiers should not begin with: * a combining character * a non-character codepoint * a whitespace character * a control character * characters which can begin syntactically valid numbers (digits, sign, point) * a delimiter (parens, at least) Identifiers should not contain: * whitespace * delimiters * non-character codepoints * control characters * invalid sequences The minimum requirement for case insensitivity as defined by R5RS gives another rule: * no character in an identifier ought to be automatically converted to the implementation's preferred case (and no identifier differing only by that character versus another ought to be considered the same identifier) unless it is part of a one-to-one reciprocal pair of upper and lower case characters as identified by char-upcase, char-downcase, and char-ci=?. This finally is the property that is required for the char-alphabetic? characters in the portable character set: R5RS does not say so specifically but it is not possible to comply with R5RS without meeting this requirement. Note that R5RS permits 'rules raping' in terms of this requirement; An implementation of R5RS is fairly easy if no characters other than a ... z and A ... Z are case-folded in case insensitive identifiers and char-alphabetic? returns #t for only those characters. The information returned from char-alphabetic? would be false in that case for all other alphabetic characters, but the letter of R5RS (so to speak) would be satisfied, however uselessly. Bear