[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

integrating PCREs



I propose to integrate PCREs by having the same API work for both. The grammer for SREs would be changed to require that they are lists. This would remove the ambiguity: strings are PCREs and lists are SREs.

<sre> ::=
     | <cset-sre>                  ; A character set match.
     | <outer-sre>

<outer-sre> ::=
     | (* <inner-sre> ...)               ; 0 or more matches.
     | (+ <inner-sre> ...)               ; 1 or more matches.
     | (? <inner-sre> ...)               ; 0 or 1 matches.
     | (= <n> <inner-sre> ...)           ; <n> matches.
     | (>= <n> <inner-sre> ...)          ; <n> or more matches.
     | (** <n> <m> <inner-sre> ...)      ; <n> to <m> matches.

     | (|  <inner-sre> ...)              ; Alternation.
     | (or <inner-sre> ...)

     | (:   <inner-sre> ...)             ; Sequence.
     | (seq <inner-sre> ...)
     | ($ <inner-sre> ...)               ; Numbered submatch.
     | (submatch <inner-sre> ...)
| (=> <name> <inner-sre> ...) ; Named submatch. <name> is
     | (submatch-named <name> <inner-sre> ...)   ;  a symbol.

| (w/case <inner-sre> ...) ; Introduce a case-sensitive context. | (w/nocase <inner-sre> ...) ; Introduce a case-insensitive context.

     | (w/unicode   <inner-sre> ...)     ; Introduce a unicode context.
     | (w/ascii <inner-sre> ...)         ; Introduce an ascii context.
| (word <inner-sre> ...) ; A sre wrapped in word boundaries. | (word+ <inner-cset-sre> ...) ; A single word restricted to a cset.
     | word                        ; A single word.

| (?? <inner-sre> ...) ; A non-greedy pattern, 0 or 1 match.
     | (*? <inner-sre> ...)                ; Non-greedy 0 or more matches.
     | (**? m n <inner-sre> ...)           ; Non-greedy <m> to <n> matches.
| (look-ahead <inner-sre> ...) ; Zero-width look-ahead assertion. | (look-behind <inner-sre> ...) ; Zero-width look-behind assertion. | (neg-look-ahead <inner-sre> ...) ; Zero-width negative look-ahead assertion. | (neg-look-behind <inner-sre> ...) ; Zero-width negative look-behind assertion.

<inner-sre> ::=
     | <outer-sre>
     | <inner-cset-sre>
     | <string>                    ; A literal string match.
     | bos                         ; Beginning of string.
     | eos                         ; End of string.

     | bol                         ; Beginning of line.
     | eol                         ; End of line.

     | bog                         ; Beginning of grapheme cluster.
     | eog                         ; End of grapheme cluster.
     | graheme                     ; A single grapheme cluster.

     | bow                         ; Beginning of word.
     | eow                         ; End of word.
     | nwb                         ; A non-word boundary.

<cset-sre> ::=
     | (<string>)                  ; literal char set
     | (/ <range-spec> ...)        ; ranges
     | (or <inner-cset-sre> ...)         ; union
     | (and <inner-cset-sre> ...)        ; intersection
     | (- <inner-cset-sre> ...)          ; difference
     | (~ <inner-cset-sre> ...)          ; complement of union
     | (w/case <inner-cset-sre> ...)     ; case and unicode toggling
     | (w/nocase <inner-cset-sre> ...)
     | (w/ascii <inner-cset-sre> ...)
     | (w/unicode <inner-cset-sre> ...)

<inner-cset-sre> ::=
     | <cset-sre>
     | <char>                      ; literal char
     | "<char>"                    ; string of one char
     | <char-set>                  ; embedded SRFI 14 char set
     | any | nonl | ascii | lower-case | lower
     | upper-case | upper | alphabetic | alpha
     | numeric | num | alphanumeric | alphanum | alnum
     | punctuation | punct | symbol | graphic | graph
     | whitespace | white | space | printing | print
     | control | cntrl | hex-digit | xdigit

<range-spec> ::= <string> | <char>