[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: english names for symbolic SREs



On 11/30/2013 3:26 PM, Alex Shinn wrote:
On Sat, Nov 30, 2013 at 11:56 PM, Michael Montague <mikemon@xxxxxxxxx> wrote:
On 11/27/2013 7:37 AM, John Cowan wrote:
> Alex Shinn scripsit:
>
>> It was John who insisted that the names be added, and John
>> who came up with most of the new names, so I'm assuming
>> he genuinely wants them.
> I do, though I didn't come up with the idea and in fact was initially
> against having more than one way to do it, but you convinced me otherwise.
> I think the long names are more self-documenting, more Schemey, and
> will make SREs more accessible to people who find string REs an
> abomination of the outer darkness.

Hypothetically, lets say that this SRFI specifies a new regular
_expression_ syntax called NRE. It should be straightforward to transform
SREs into NREs. The existing SRE implementations (IrRegex and SCSH) can
provide a procedure sre->nre which people with existing SREs can use and
their code is not gratuitously left behind.

The problem with providing both short names and long names is that when
I write SREs I can just use the long names, but when I read other
peoples SREs, then I potentially still need to know both.

The short names have been in use for over a decade.
They are much friendlier to people used to PCREs, which
honestly is our primary target.  Brevity is thus an
important feature since people will compare the length
of these to PCREs.  With the short names there are even
cases where SREs are shorter than PCRE, for example

  `(:,x(*,y))

versus

  "(?:\Q$x\E(?:\Q$y\E)*)"

Moreover, brevity is inherently important because people
will type these interactively into editors to search, and here
the number of keystrokes really matters.

If you think having two names is a bad idea we can still
remove the long names.

There already exists an extremely widely used regular _expression_ syntax designed for brevity. We do not need to standardize another one designed for brevity. We need to standardize a regular _expression_ syntax that is readable, understandable, and maintainable by people other than regular _expression_ experts. We need one that fits with Scheme and is friendly to Scheme programmers.

SREs uses three short names in common with PCREs: '*', '+', and '?'. One short name, '$', has its meaning changed from PCREs. The rest are unique to SREs: '=', '>=', '**', ':', '=>', '??', '*?', '**?', '/', '-', and '~'. The only reason that I can think of that these would be friendly to people used to PCREs is that they are already trained to believe that regular _expression_ syntax has to be cryptic.

I think two names is a bad idea, but I want to get rid of the short ones. The regular _expression_ syntax that I think we should be standardizing does not have brevity as a goal.

When I asked early on, "what are the benefits of the SRE syntax" I got a strong reaction. To me, the advantages of list structure does not outweigh the disadvantages of having to learn yet more cryptic operator names. When I want to write a regular _expression_, I could pull out the documentation for SREs, and figure out how to do it. But when I come back a month later to change it or fix a bug, I would have to pull out the documentation again. Why bother. I might as well just use PCREs; at least then anything do I retain can be used outside the world of Scheme.



and I stand by t. Why bother standardizing another
If we are going to try to standardize another regular _expression_ syntax designed for