Re: english names for symbolic SREs

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

To: Alex Shinn <alexshinn@xxxxxxxxx>, SRFI-115 discussion list <srfi-115@xxxxxxxxxxxxxxxxx>

Subject: Re: english names for symbolic SREs

From: Michael Montague <mikemon@xxxxxxxxx>

Date: Sun, 01 Dec 2013 07:37:05 -0800

Delivered-to: srfi-115@xxxxxxxxxxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=6wrPAbGAtabXigDftTee9KaHCqhfEPWboy2oNrmQbtc=; b=c7BYdaqjnielGgq/oAguMP397JnybJWTZXqiP0I9c8sMJ8MEiyRLwBbJhhvN/KhPf7 HofxTLw7Lg9fGwnZWHLTb3KU0uLc0rVBGvHffcfNzjOKFe4HdhdYVRAs60EAkrhlEL0B piDEKMMQDOeRXz42rk4ERIin8/JcJwPTcF94u1XjP9I+Ulz49kM4XYQauPGQ0b1HHsmu lJXHoBj7rykXTV0yU6umjQOj8teNw2sJ1nL/2OvhRpNeCYxVBGxVVmstBLcHPNSbuYZ/ spPrC5wbBUON9TiBqT+gRpjHl0aNrjvPxPx+Bxmtn91rPu/AZiCSVJAlKEZj8UfyT5ty 2D6w==

In-reply-to: <CAMMPzYN-L9yGBp-h7XV2A8aecBg0MHrL=dJBnGEaOdvAdpiMyQ@mail.gmail.com>

References: <CAMMPzYOMNkno7=PdYkjqyE+vQayg5=jbxRTo=OZ0Jy9etqjg-w@mail.gmail.com> <5294DEC2.6000908@gmail.com> <20131126182817.GJ20755@mercury.ccil.org> <52950EFB.1000101@gmail.com> <20131127012714.GC29339@mercury.ccil.org> <52957D2B.4030901@gmail.com> <CAMMPzYMtJ2E2BLqLR+jCCH8u+O1=ouhHFnLbJyh6ej90M6JLPA@mail.gmail.com> <20131127153702.GB6887@mercury.ccil.org> <y9lbo12aqgp.fsf@deinprogramm.de> <CAMMPzYN-L9yGBp-h7XV2A8aecBg0MHrL=dJBnGEaOdvAdpiMyQ@mail.gmail.com>

User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1

On 11/30/2013 3:26 PM, Alex Shinn wrote:

On Sat, Nov 30, 2013 at 11:56 PM, Michael Montague <mikemon@xxxxxxxxx> wrote:

On 11/27/2013 7:37 AM, John Cowan wrote:

> Alex Shinn scripsit:
>
>> It was John who insisted that the names be added, and John
>> who came up with most of the new names, so I'm assuming
>> he genuinely wants them.
> I do, though I didn't come up with the idea and in fact was initially
> against having more than one way to do it, but you convinced me otherwise.
> I think the long names are more self-documenting, more Schemey, and
> will make SREs more accessible to people who find string REs an
> abomination of the outer darkness.

Hypothetically, lets say that this SRFI specifies a new regular
_expression_ syntax called NRE. It should be straightforward to transform
SREs into NREs. The existing SRE implementations (IrRegex and SCSH) can
provide a procedure sre->nre which people with existing SREs can use and
their code is not gratuitously left behind.

The problem with providing both short names and long names is that when
I write SREs I can just use the long names, but when I read other
peoples SREs, then I potentially still need to know both.

The short names have been in use for over a decade.

They are much friendlier to people used to PCREs, which

honestly is our primary target. Brevity is thus an

important feature since people will compare the length

of these to PCREs. With the short names there are even

cases where SREs are shorter than PCRE, for example

`(:,x(*,y))

versus

"(?:\Q$x\E(?:\Q$y\E)*)"

Moreover, brevity is inherently important because people

will type these interactively into editors to search, and here

the number of keystrokes really matters.

If you think having two names is a bad idea we can still

remove the long names.

There already exists an extremely widely used regular _expression_ syntax designed for brevity. We do not need to standardize another one designed for brevity. We need to standardize a regular _expression_ syntax that is readable, understandable, and maintainable by people other than regular _expression_ experts. We need one that fits with Scheme and is friendly to Scheme programmers.

SREs uses three short names in common with PCREs: '*', '+', and '?'. One short name, '$', has its meaning changed from PCREs. The rest are unique to SREs: '=', '>=', '**', ':', '=>', '??', '*?', '**?', '/', '-', and '~'. The only reason that I can think of that these would be friendly to people used to PCREs is that they are already trained to believe that regular _expression_ syntax has to be cryptic.

I think two names is a bad idea, but I want to get rid of the short ones. The regular _expression_ syntax that I think we should be standardizing does not have brevity as a goal.

When I asked early on, "what are the benefits of the SRE syntax" I got a strong reaction. To me, the advantages of list structure does not outweigh the disadvantages of having to learn yet more cryptic operator names. When I want to write a regular _expression_, I could pull out the documentation for SREs, and figure out how to do it. But when I come back a month later to change it or fix a bug, I would have to pull out the documentation again. Why bother. I might as well just use PCREs; at least then anything do I retain can be used outside the world of Scheme.

and I stand by t. Why bother standardizing another
If we are going to try to standardize another regular _expression_ syntax designed for