[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: A proposal for reserved read-syntax characters



>     {[, ], {, |, }} 
>   & (Pattern_Syntax - (ASCII|Sc|Sm|So))
>   & Pattern_Whitespace

I like that this choice of delimiter characters
provides another argument against the over-broad 
identifier syntax in the current draft.

I'm not sure that definition is right for
Scheme though.  It's especially not clear
to me that *all* of those symbols (Sc, Sm, So)
should be allowed as identifier constituents --
some of them might be better as delimiters (if
they are permitted in source texts at all).

(All of this is another argument that identifier
syntax liberalization is premature.   Since
it isn't needed to get the ball rolling in terms
of portable Unicode-happy programs, skip it for
now.)

-t

--- Begin Message ---
Since Unicode provides 97,655 characters to play * with (as of Unicode 4.1),
it may be time to add some characters to the current list of five
reserved syntax characters ([, ], {, }, |).  That would bar the use of
these characters in identifiers, and allow them to be used by any Scheme
system that has redefinable read syntax for whatever purpose.

Unicode defines two non-normative classes for the purpose,
Pattern_Syntax and Pattern_Whitespace.  The intention is that neither may
be used in identifiers.  (The Unicode classes relating to identifiers
are too restrictive for Scheme, and are intended for languages in which
identifiers can't contain symbol characters.)

I propose that the following set of characters be disallowed in identifiers:

{[, ], {, |, }} & (Pattern_Syntax - (ASCII|Sc|Sm|So)) & Pattern_Whitespace

Excluding ASCII characters from Pattern_Syntax permits all of our
existing ASCII identifier characters, regardless of their Unicode status.
The various S codes represent various mathematical and non-mathematical
operator and symbol characters that might plausibly see use in
identifiers.  Pattern_Whitespace is a small set of control/whitespace
characters: TAB, LF, VT, FF, CR, SP, plus the new NEL, LRM, RLM, LS, and PS.

Here's the full list of 186 reserved syntax characters that I'm proposing.
They should be more than enough for even the most grandiose read-syntax
extensions.

005B LEFT SQUARE BRACKET
005D RIGHT SQUARE BRACKET
007B LEFT CURLY BRACKET
007C VERITCATICAL LINE
007D RIGHT CURRLY VBRBRACKET
00A1 INVERTED EXCLAMATION MARK
00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
00BF INVERTED QUESTION MARK
2010 HYPHEN
2011 NON-BREAKING HYPHEN
2012 FIGURE DASH
2013 EN DASH
2014 EM DASH
2015 HORIZONTAL BAR
2016 DOUBLE VERTICAL LINE
2017 DOUBLE LOW LINE
2018 LEFT SINGLE QUOTATION MARK
2019 RIGHT SINGLE QUOTATION MARK
201A SINGLE LOW-9 QUOTATION MARK
201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
201C LEFT DOUBLE QUOTATION MARK
201D RIGHT DOUBLE QUOTATION MARK
201E DOUBLE LOW-9 QUOTATION MARK
201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
2020 DAGGER
2021 DOUBLE DAGGER
2022 BULLET
2023 TRIANGULAR BULLET
2024 ONE DOT LEADER
2025 TWO DOT LEADER
2026 HORIZONTAL ELLIPSIS
2027 HYPHENATION POINT
2030 PER MILLE SIGN
2031 PER TEN THOUSAND SIGN
2032 PRIME
2033 DOUBLE PRIME
2034 TRIPLE PRIME
2035 REVERSED PRIME
2036 REVERSED DOUBLE PRIME
2037 REVERSED TRIPLE PRIME
2038 CARET
2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
203B REFERENCE MARK
203C DOUBLE EXCLAMATION MARK
203D INTERROBANG
203E OVERLINE
2041 CARET INSERTION POINT
2042 ASTERISM
2043 HYPHEN BULLET
2045 LEFT SQUARE BRACKET WITH QUILL
2046 RIGHT SQUARE BRACKET WITH QUILL
2047 DOUBLE QUESTION MARK
2048 QUESTION EXCLAMATION MARK
2049 EXCLAMATION QUESTION MARK
204A TIRONIAN SIGN ET
204B REVERSED PILCROW SIGN
204C BLACK LEFTWARDS BULLET
204D BLACK RIGHTWARDS BULLET
204E LOW ASTERISK
204F REVERSED SEMICOLON
2050 CLOSE UP
2051 TWO ASTERISKS ALIGNED VERTICALLY
2053 SWUNG DASH
2055 FLOWER PUNCTUATION MARK
2056 THREE DOT PUNCTUATION
2057 QUADRUPLE PRIME
2058 FOUR DOT PUNCTUATION
2059 FIVE DOT PUNCTUATION
205A TWO DOT PUNCTUATION
205B FOUR DOT MARK
205C DOTTED CROSS
205D TRICOLON
205E VERTICAL FOUR DOTS
2329 LEFT-POINTING ANGLE BRACKET
232A RIGHT-POINTING ANGLE BRACKET
23B4 TOP SQUARE BRACKET
23B5 BOTTOM SQUARE BRACKET
23B6 BOTTOM SQUARE BRACKET OVER TOP SQUARE BRACKET
2768 MEDIUM LEFT PARENTHESIS ORNAMENT
2769 MEDIUM RIGHT PARENTHESIS ORNAMENT
276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT
276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT
276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT
2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT
2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT
2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT
27C5 LEFT S-SHAPED BAG DELIMITER
27C6 RIGHT S-SHAPED BAG DELIMITER
27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET
27E7 MATHEMATICAL RIGHT WHITE SQUARE BRACKET
27E8 MATHEMATICAL LEFT ANGLE BRACKET
27E9 MATHEMATICAL RIGHT ANGLE BRACKET
27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
27EB MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
2983 LEFT WHITE CURLY BRACKET
2984 RIGHT WHITE CURLY BRACKET
2985 LEFT WHITE PARENTHESIS
2986 RIGHT WHITE PARENTHESIS
2987 Z NOTATION LEFT IMAGE BRACKET
2988 Z NOTATION RIGHT IMAGE BRACKET
2989 Z NOTATION LEFT BINDING BRACKET
298A Z NOTATION RIGHT BINDING BRACKET
298B LEFT SQUARE BRACKET WITH UNDERBAR
298C RIGHT SQUARE BRACKET WITH UNDERBAR
298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
298E RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
2991 LEFT ANGLE BRACKET WITH DOT
2992 RIGHT ANGLE BRACKET WITH DOT
2993 LEFT ARC LESS-THAN BRACKET
2994 RIGHT ARC GREATER-THAN BRACKET
2995 DOUBLE LEFT ARC GREATER-THAN BRACKET
2996 DOUBLE RIGHT ARC LESS-THAN BRACKET
2997 LEFT BLACK TORTOISE SHELL BRACKET
2998 RIGHT BLACK TORTOISE SHELL BRACKET
29D8 LEFT WIGGLY FENCE
29D9 RIGHT WIGGLY FENCE
29DA LEFT DOUBLE WIGGLY FENCE
29DB RIGHT DOUBLE WIGGLY FENCE
29FC LEFT-POINTING CURVED ANGLE BRACKET
29FD RIGHT-POINTING CURVED ANGLE BRACKET
2E00 RIGHT ANGLE SUBSTITUTION MARKER
2E01 RIGHT ANGLE DOTTED SUBSTITUTION MARKER
2E02 LEFT SUBSTITUTION BRACKET
2E03 RIGHT SUBSTITUTION BRACKET
2E04 LEFT DOTTED SUBSTITUTION BRACKET
2E05 RIGHT DOTTED SUBSTITUTION BRACKET
2E06 RAISED INTERPOLATION MARKER
2E07 RAISED DOTTED INTERPOLATION MARKER
2E08 DOTTED TRANSPOSITION MARKER
2E09 LEFT TRANSPOSITION BRACKET
2E0A RIGHT TRANSPOSITION BRACKET
2E0B RAISED SQUARE
2E0C LEFT RAISED OMISSION BRACKET
2E0D RIGHT RAISED OMISSION BRACKET
2E0E EDITORIAL CORONIS
2E0F PARAGRAPHOS
2E10 FORKED PARAGRAPHOS
2E11 REVERSED FORKED PARAGRAPHOS
2E12 HYPODIASTOLE
2E13 DOTTED OBELOS
2E14 DOWNWARDS ANCORA
2E15 UPWARDS ANCORA
2E16 DOTTED RIGHT-POINTING ANGLE
2E17 DOUBLE OBLIQUE HYPHEN
2E1C LEFT LOW PARAPHRASE BRACKET
2E1D RIGHT LOW PARAPHRASE BRACKET
3001 IDEOGRAPHIC COMMA
3002 IDEOGRAPHIC FULL STOP
3003 DITTO MARK
3008 LEFT ANGLE BRACKET
3009 RIGHT ANGLE BRACKET
300A LEFT DOUBLE ANGLE BRACKET
300B RIGHT DOUBLE ANGLE BRACKET
300C LEFT CORNER BRACKET
300D RIGHT CORNER BRACKET
300E LEFT WHITE CORNER BRACKET
300F RIGHT WHITE CORNER BRACKET
3010 LEFT BLACK LENTICULAR BRACKET
3011 RIGHT BLACK LENTICULAR BRACKET
3014 LEFT TORTOISE SHELL BRACKET
3015 RIGHT TORTOISE SHELL BRACKET
3016 LEFT WHITE LENTICULAR BRACKET
3017 RIGHT WHITE LENTICULAR BRACKET
3018 LEFT WHITE TORTOISE SHELL BRACKET
3019 RIGHT WHITE TORTOISE SHELL BRACKET
301A LEFT WHITE SQUARE BRACKET
301B RIGHT WHITE SQUARE BRACKET
301C WAVE DASH
301D REVERSED DOUBLE PRIME QUOTATION MARK
301E DOUBLE PRIME QUOTATION MARK
301F LOW DOUBLE PRIME QUOTATION MARK
3030 WAVY DASH
FD3E ORNATE LEFT PARENTHESIS
FD3F ORNATE RIGHT PARENTHESIS
FE45 SESAME DOT
FE46 WHITE SESAME DOT

-- 
May the hair on your toes never fall out!       John Cowan
        --Thorin Oakenshield (to Bilbo)         jcowan@xxxxxxxxxxxxxxxxx



--- End Message ---
--- Begin Message ---
Since Unicode provides 97,655 characters to play * with (as of Unicode 4.1),
it may be time to add some characters to the current list of five
reserved syntax characters ([, ], {, }, |).  That would bar the use of
these characters in identifiers, and allow them to be used by any Scheme
system that has redefinable read syntax for whatever purpose.

Unicode defines two non-normative classes for the purpose,
Pattern_Syntax and Pattern_Whitespace.  The intention is that neither may
be used in identifiers.  (The Unicode classes relating to identifiers
are too restrictive for Scheme, and are intended for languages in which
identifiers can't contain symbol characters.)

I propose that the following set of characters be disallowed in identifiers:

{[, ], {, |, }} & (Pattern_Syntax - (ASCII|Sc|Sm|So)) & Pattern_Whitespace

Excluding ASCII characters from Pattern_Syntax permits all of our
existing ASCII identifier characters, regardless of their Unicode status.
The various S codes represent various mathematical and non-mathematical
operator and symbol characters that might plausibly see use in
identifiers.  Pattern_Whitespace is a small set of control/whitespace
characters: TAB, LF, VT, FF, CR, SP, plus the new NEL, LRM, RLM, LS, and PS.

Here's the full list of 186 reserved syntax characters that I'm proposing.
They should be more than enough for even the most grandiose read-syntax
extensions.

005B LEFT SQUARE BRACKET
005D RIGHT SQUARE BRACKET
007B LEFT CURLY BRACKET
007C VERITCATICAL LINE
007D RIGHT CURRLY VBRBRACKET
00A1 INVERTED EXCLAMATION MARK
00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
00BF INVERTED QUESTION MARK
2010 HYPHEN
2011 NON-BREAKING HYPHEN
2012 FIGURE DASH
2013 EN DASH
2014 EM DASH
2015 HORIZONTAL BAR
2016 DOUBLE VERTICAL LINE
2017 DOUBLE LOW LINE
2018 LEFT SINGLE QUOTATION MARK
2019 RIGHT SINGLE QUOTATION MARK
201A SINGLE LOW-9 QUOTATION MARK
201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
201C LEFT DOUBLE QUOTATION MARK
201D RIGHT DOUBLE QUOTATION MARK
201E DOUBLE LOW-9 QUOTATION MARK
201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
2020 DAGGER
2021 DOUBLE DAGGER
2022 BULLET
2023 TRIANGULAR BULLET
2024 ONE DOT LEADER
2025 TWO DOT LEADER
2026 HORIZONTAL ELLIPSIS
2027 HYPHENATION POINT
2030 PER MILLE SIGN
2031 PER TEN THOUSAND SIGN
2032 PRIME
2033 DOUBLE PRIME
2034 TRIPLE PRIME
2035 REVERSED PRIME
2036 REVERSED DOUBLE PRIME
2037 REVERSED TRIPLE PRIME
2038 CARET
2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
203B REFERENCE MARK
203C DOUBLE EXCLAMATION MARK
203D INTERROBANG
203E OVERLINE
2041 CARET INSERTION POINT
2042 ASTERISM
2043 HYPHEN BULLET
2045 LEFT SQUARE BRACKET WITH QUILL
2046 RIGHT SQUARE BRACKET WITH QUILL
2047 DOUBLE QUESTION MARK
2048 QUESTION EXCLAMATION MARK
2049 EXCLAMATION QUESTION MARK
204A TIRONIAN SIGN ET
204B REVERSED PILCROW SIGN
204C BLACK LEFTWARDS BULLET
204D BLACK RIGHTWARDS BULLET
204E LOW ASTERISK
204F REVERSED SEMICOLON
2050 CLOSE UP
2051 TWO ASTERISKS ALIGNED VERTICALLY
2053 SWUNG DASH
2055 FLOWER PUNCTUATION MARK
2056 THREE DOT PUNCTUATION
2057 QUADRUPLE PRIME
2058 FOUR DOT PUNCTUATION
2059 FIVE DOT PUNCTUATION
205A TWO DOT PUNCTUATION
205B FOUR DOT MARK
205C DOTTED CROSS
205D TRICOLON
205E VERTICAL FOUR DOTS
2329 LEFT-POINTING ANGLE BRACKET
232A RIGHT-POINTING ANGLE BRACKET
23B4 TOP SQUARE BRACKET
23B5 BOTTOM SQUARE BRACKET
23B6 BOTTOM SQUARE BRACKET OVER TOP SQUARE BRACKET
2768 MEDIUM LEFT PARENTHESIS ORNAMENT
2769 MEDIUM RIGHT PARENTHESIS ORNAMENT
276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT
276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT
276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT
2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT
2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT
2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT
27C5 LEFT S-SHAPED BAG DELIMITER
27C6 RIGHT S-SHAPED BAG DELIMITER
27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET
27E7 MATHEMATICAL RIGHT WHITE SQUARE BRACKET
27E8 MATHEMATICAL LEFT ANGLE BRACKET
27E9 MATHEMATICAL RIGHT ANGLE BRACKET
27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
27EB MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
2983 LEFT WHITE CURLY BRACKET
2984 RIGHT WHITE CURLY BRACKET
2985 LEFT WHITE PARENTHESIS
2986 RIGHT WHITE PARENTHESIS
2987 Z NOTATION LEFT IMAGE BRACKET
2988 Z NOTATION RIGHT IMAGE BRACKET
2989 Z NOTATION LEFT BINDING BRACKET
298A Z NOTATION RIGHT BINDING BRACKET
298B LEFT SQUARE BRACKET WITH UNDERBAR
298C RIGHT SQUARE BRACKET WITH UNDERBAR
298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
298E RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
2991 LEFT ANGLE BRACKET WITH DOT
2992 RIGHT ANGLE BRACKET WITH DOT
2993 LEFT ARC LESS-THAN BRACKET
2994 RIGHT ARC GREATER-THAN BRACKET
2995 DOUBLE LEFT ARC GREATER-THAN BRACKET
2996 DOUBLE RIGHT ARC LESS-THAN BRACKET
2997 LEFT BLACK TORTOISE SHELL BRACKET
2998 RIGHT BLACK TORTOISE SHELL BRACKET
29D8 LEFT WIGGLY FENCE
29D9 RIGHT WIGGLY FENCE
29DA LEFT DOUBLE WIGGLY FENCE
29DB RIGHT DOUBLE WIGGLY FENCE
29FC LEFT-POINTING CURVED ANGLE BRACKET
29FD RIGHT-POINTING CURVED ANGLE BRACKET
2E00 RIGHT ANGLE SUBSTITUTION MARKER
2E01 RIGHT ANGLE DOTTED SUBSTITUTION MARKER
2E02 LEFT SUBSTITUTION BRACKET
2E03 RIGHT SUBSTITUTION BRACKET
2E04 LEFT DOTTED SUBSTITUTION BRACKET
2E05 RIGHT DOTTED SUBSTITUTION BRACKET
2E06 RAISED INTERPOLATION MARKER
2E07 RAISED DOTTED INTERPOLATION MARKER
2E08 DOTTED TRANSPOSITION MARKER
2E09 LEFT TRANSPOSITION BRACKET
2E0A RIGHT TRANSPOSITION BRACKET
2E0B RAISED SQUARE
2E0C LEFT RAISED OMISSION BRACKET
2E0D RIGHT RAISED OMISSION BRACKET
2E0E EDITORIAL CORONIS
2E0F PARAGRAPHOS
2E10 FORKED PARAGRAPHOS
2E11 REVERSED FORKED PARAGRAPHOS
2E12 HYPODIASTOLE
2E13 DOTTED OBELOS
2E14 DOWNWARDS ANCORA
2E15 UPWARDS ANCORA
2E16 DOTTED RIGHT-POINTING ANGLE
2E17 DOUBLE OBLIQUE HYPHEN
2E1C LEFT LOW PARAPHRASE BRACKET
2E1D RIGHT LOW PARAPHRASE BRACKET
3001 IDEOGRAPHIC COMMA
3002 IDEOGRAPHIC FULL STOP
3003 DITTO MARK
3008 LEFT ANGLE BRACKET
3009 RIGHT ANGLE BRACKET
300A LEFT DOUBLE ANGLE BRACKET
300B RIGHT DOUBLE ANGLE BRACKET
300C LEFT CORNER BRACKET
300D RIGHT CORNER BRACKET
300E LEFT WHITE CORNER BRACKET
300F RIGHT WHITE CORNER BRACKET
3010 LEFT BLACK LENTICULAR BRACKET
3011 RIGHT BLACK LENTICULAR BRACKET
3014 LEFT TORTOISE SHELL BRACKET
3015 RIGHT TORTOISE SHELL BRACKET
3016 LEFT WHITE LENTICULAR BRACKET
3017 RIGHT WHITE LENTICULAR BRACKET
3018 LEFT WHITE TORTOISE SHELL BRACKET
3019 RIGHT WHITE TORTOISE SHELL BRACKET
301A LEFT WHITE SQUARE BRACKET
301B RIGHT WHITE SQUARE BRACKET
301C WAVE DASH
301D REVERSED DOUBLE PRIME QUOTATION MARK
301E DOUBLE PRIME QUOTATION MARK
301F LOW DOUBLE PRIME QUOTATION MARK
3030 WAVY DASH
FD3E ORNATE LEFT PARENTHESIS
FD3F ORNATE RIGHT PARENTHESIS
FE45 SESAME DOT
FE46 WHITE SESAME DOT

-- 
May the hair on your toes never fall out!       John Cowan
        --Thorin Oakenshield (to Bilbo)         jcowan@xxxxxxxxxxxxxxxxx



--- End Message ---