Re: english names for symbolic SREs

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

To: Alex Shinn <alexshinn@xxxxxxxxx>, SRFI-115 discussion list <srfi-115@xxxxxxxxxxxxxxxxx>

Subject: Re: english names for symbolic SREs

From: Michael Montague <mikemon@xxxxxxxxx>

Date: Tue, 26 Nov 2013 09:47:46 -0800

Delivered-to: srfi-115@xxxxxxxxxxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=QCfuusp4im0fmPARsf08nky9HL1t0J2NVxeYf53/XzM=; b=EP7tf+zA8sb+hyLANFj/5fSMfmblVbxg3NBm6qJjoJnLCpj50AXUZ5nv3YNEQ4oA8G M/vCn+N4r5g6NW7Dv2EZt3waxtaeNZnXCLIQI3KNQfrGXGV/NFvN1uz6TFsZowoEI+n9 HzhGfnCzQdSSEy7irVJmjAHUh/jYu/U40V3OLf5iXqI4dAFJ8aLHTV2kr6Y9LKGOdcji xUS6mE6pm9xKyk8klmHYKC1xoMYPR7QeKoNOngmJpV4NoB6WtKQA+sA1f7lFRkBcaxsM IZb5Kh75WeaMZJuY0exnYfUGMVt19OJm4i9J+/Qq6XCu6HPqo0Ux8JEwlIwss5IDzyK+ kBOw==

In-reply-to: <CAMMPzYOMNkno7=PdYkjqyE+vQayg5=jbxRTo=OZ0Jy9etqjg-w@mail.gmail.com>

References: <CAMMPzYOMNkno7=PdYkjqyE+vQayg5=jbxRTo=OZ0Jy9etqjg-w@mail.gmail.com>

User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1

I propose breaking SREs completely free of PCREs. Do away with * + ? = ?? *? **? etc. Have a single way to specify each operation.The short names are meaningless: unless you already know PCREs, '*' means multiplication and '**?' looks like comic book cuss words.(zero-or-more <sre> ...) ; 0 or more matches-- or 'zero...' or just keep '*'(one-or-more <sre> ...) ; 1 or more matches-- or 'one...' or just keep '+'(maybe <sre> ...) ; 0 or 1 matches --- or 'optional' or just keep '?'(repeat <n> <sre> ...) ; <n> or more matches(repeat <m> <n> <sre> ...) ; <m> to <n> matches(lazy <n> <sre> ...) ; <n> or more lazy matches(lazy <m> <n> <sre> ...) ; <m> to <n> lazy matches (non-greedy <n> <sre> ...) ; <n> or more non-greedy matches (non-greedy <m> <n> <sre> ...) ; <m> to <n> non-greedy matches(or <sre> ...) ; alternation(and <sre> ...) ; sequencing(submatch <name> <sre> ...) ; capturing a submatch -- do away with indexed submatches

'repeat', 'lazy', and 'non-greedy' are the general way to match a variable number of times: (zero-or-more <sre> ...) is the same as (repeat 0 <sre> ...).(char-range <range-spec> ...) ; ranges(char-or <cset-sre> ...) ; union(char-and <cset-sre> ...) ; intersection(char-difference <cset-sre> ...) ; difference(char-complement <cset-sre> ...) ; complement of unionI admit to preferring '*' for zero-or-more, '+' for one-or-more, and '?' for maybe, but I have already been corrupted by PCREs. But I think that we should have one or the other. Having two names for operations means there is that much more to remember in order to be able to read an SRE. I know that I was the one that proposed long names for everything in the first place. After thinking about it more, I think that having two names for operations is worse than having just a short name.But I really think that we should get rid of the short names and use the long names for everything -- or almost everything.

On 11/26/2013 5:01 AM, Alex Shinn wrote:

Traditionally SREs have had the following aliases
allowing the user to choose between brevity and

self-description:

From SCSH:

| or

& and

: seq

From IrRegex (in this case introducing a new short form):

$ submatch

=> submatch-named

For consistency Michael Montague suggested all

SREs have a short and long form. John Cowan

suggests the following names:

? optional

* zero-or-more

+ one-or-more

>= at-least

= exactly

** repeated

?? non-greedy-optional

*? non-greedy-zero-or-more

**? non-greedy-repeated

For the cset-sres we'd also need:

/ char-range (or cset-range?)

- difference (or diff?)

~ complement (or not?)

I would suggest not introducing new short forms

of existing long names. Comments welcome, but

if there are no objections I'll go with this.

--

Alex