I propose breaking SREs completely free of PCREs. Do away with * + ? = ?? *? **? etc. Have a single way to specify each operation. The short names are meaningless: unless you already know
PCREs, '*' means multiplication and '**?' looks like comic book cuss words.
(zero-or-more <sre> ...) ; 0 or more matches -- or 'zero...' or just keep '*'
(one-or-more <sre> ...) ; 1 or more matches -- or 'one...' or just keep '+'
(maybe <sre> ...) ; 0 or 1 matches --- or 'optional' or just keep '?'
(repeat <n> <sre> ...) ; <n> or more matches
(repeat <m> <n> <sre> ...) ; <m> to <n> matches
(lazy <n> <sre> ...) ; <n> or more lazy matches
(lazy <m> <n> <sre> ...) ; <m> to <n> lazy matches
(non-greedy <n> <sre> ...) ; <n> or more non-greedy matches
(non-greedy <m> <n> <sre> ...) ; <m> to <n> non-greedy matches
(or <sre> ...) ; alternation
(and <sre> ...) ; sequencing
(submatch <name> <sre> ...) ; capturing a submatch -- do away with indexed submatches
'repeat', 'lazy', and 'non-greedy' are the general way to match a variable number of times: (zero-or-more <sre> ...) is the same as (repeat 0 <sre> ...).
(char-range <range-spec> ...) ; ranges
(char-or <cset-sre> ...) ; union
(char-and <cset-sre> ...) ; intersection
(char-difference <cset-sre> ...) ; difference
(char-complement <cset-sre> ...) ; complement of union
I admit to preferring '*' for zero-or-more, '+' for one-or-more, and '?' for maybe, but I have already been corrupted by PCREs. But I think that we should have one or the other. Having two names for operations means there is that much more to remember in order to be able to read an SRE. I know that I was the one that proposed long names for everything in the first place. After thinking about it more, I think that having two names for operations is worse than having just a short name. But I really think that we should get rid of the short names and use the long names for everything -- or almost everything.
On 11/26/2013 5:01 AM, Alex Shinn wrote: