[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

integrating PCREs

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.



I propose to integrate PCREs by having the same API work for both. The grammer for SREs would be changed to require that they are lists. This would remove the ambiguity: strings are PCREs and lists are SREs.

<sre> ::=
     | <cset-sre>                  ; A character set match.
     | <outer-sre>

<outer-sre> ::=
     | (* <inner-sre> ...)               ; 0 or more matches.
     | (+ <inner-sre> ...)               ; 1 or more matches.
     | (? <inner-sre> ...)               ; 0 or 1 matches.
     | (= <n> <inner-sre> ...)           ; <n> matches.
     | (>= <n> <inner-sre> ...)          ; <n> or more matches.
     | (** <n> <m> <inner-sre> ...)      ; <n> to <m> matches.

     | (|  <inner-sre> ...)              ; Alternation.
     | (or <inner-sre> ...)

     | (:   <inner-sre> ...)             ; Sequence.
     | (seq <inner-sre> ...)
     | ($ <inner-sre> ...)               ; Numbered submatch.
     | (submatch <inner-sre> ...)
| (=> <name> <inner-sre> ...) ; Named submatch. <name> is
     | (submatch-named <name> <inner-sre> ...)   ;  a symbol.

| (w/case <inner-sre> ...) ; Introduce a case-sensitive context. | (w/nocase <inner-sre> ...) ; Introduce a case-insensitive context.

     | (w/unicode   <inner-sre> ...)     ; Introduce a unicode context.
     | (w/ascii <inner-sre> ...)         ; Introduce an ascii context.
| (word <inner-sre> ...) ; A sre wrapped in word boundaries. | (word+ <inner-cset-sre> ...) ; A single word restricted to a cset.
     | word                        ; A single word.

| (?? <inner-sre> ...) ; A non-greedy pattern, 0 or 1 match.
     | (*? <inner-sre> ...)                ; Non-greedy 0 or more matches.
     | (**? m n <inner-sre> ...)           ; Non-greedy <m> to <n> matches.
| (look-ahead <inner-sre> ...) ; Zero-width look-ahead assertion. | (look-behind <inner-sre> ...) ; Zero-width look-behind assertion. | (neg-look-ahead <inner-sre> ...) ; Zero-width negative look-ahead assertion. | (neg-look-behind <inner-sre> ...) ; Zero-width negative look-behind assertion.

<inner-sre> ::=
     | <outer-sre>
     | <inner-cset-sre>
     | <string>                    ; A literal string match.
     | bos                         ; Beginning of string.
     | eos                         ; End of string.

     | bol                         ; Beginning of line.
     | eol                         ; End of line.

     | bog                         ; Beginning of grapheme cluster.
     | eog                         ; End of grapheme cluster.
     | graheme                     ; A single grapheme cluster.

     | bow                         ; Beginning of word.
     | eow                         ; End of word.
     | nwb                         ; A non-word boundary.

<cset-sre> ::=
     | (<string>)                  ; literal char set
     | (/ <range-spec> ...)        ; ranges
     | (or <inner-cset-sre> ...)         ; union
     | (and <inner-cset-sre> ...)        ; intersection
     | (- <inner-cset-sre> ...)          ; difference
     | (~ <inner-cset-sre> ...)          ; complement of union
     | (w/case <inner-cset-sre> ...)     ; case and unicode toggling
     | (w/nocase <inner-cset-sre> ...)
     | (w/ascii <inner-cset-sre> ...)
     | (w/unicode <inner-cset-sre> ...)

<inner-cset-sre> ::=
     | <cset-sre>
     | <char>                      ; literal char
     | "<char>"                    ; string of one char
     | <char-set>                  ; embedded SRFI 14 char set
     | any | nonl | ascii | lower-case | lower
     | upper-case | upper | alphabetic | alpha
     | numeric | num | alphanumeric | alphanum | alnum
     | punctuation | punct | symbol | graphic | graph
     | whitespace | white | space | printing | print
     | control | cntrl | hex-digit | xdigit

<range-spec> ::= <string> | <char>