[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: benefits of SRE syntax

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

On Wed, Oct 16, 2013 at 11:44:40AM -0700, Michael Montague wrote:
> The rational also lists three benefits of the SRE syntax:
> (1) They are easier to read.
> (2) They are easier to extend.
> (3) They are both faster and simpler to compile.
> On benefit (1): they are more verbose, and readability is subjective. 

We're Schemers, aren't we?  Verbosity (the good kind which provides
more clarity) is considered a Good Thing around here :)

> They will look different to anyone who has already learned the 
> traditional syntax of regular expressions.

Yet, whenever working in other languages, I end up cursing the fiddly
and stupid in-string syntax.  I want to match a literal parenthesis: do
I escape it?  Or maybe double-escape, depending on the language I'm in
and the type of string quotes I am using.  Oh shit, this tool defaults
to accepting basic regexes so I don't need to escape the parenthesis.
But maybe that's unfair and I should be looking at one particular regex
implementation instead of the mess that the UNIX world made of these
things (did someone just run away screaming about shell globbing?).

And don't even get me started on the mess you get when you begin
composing regexes from subparts!  Grouping should be distinct from
submatch definition.  SREs compose very easily, especially when
you're using named submatches.  And best of all, you don't need to
mess around with escaping user string input (which *everybody* forgets,
leading to correctness and security issues, especially if your PCRE
version includes a "eval" escape hatch *looks at PHP*).

Yes, all these things *can* be done with PCRE, but it's all very painful
and difficult to remember.  I can't even remember what all the \X-style
escaped letters do (some more verbosity would really help PCRE).

> On benefit (3): some (most?) implementations will compile the SREs to 
> the traditional syntax and use a library like PCRE.

There are currently three implementations that I'm aware of which
implement SREs.  Two of them are by the SRFI's author, and both are
100% native Scheme code, which implement matching via custom state
machines.  The other is from the inventor of this syntax, who indeed
used PCRE as a backend, probably for convenience.

I don't understand why you think most implementations will write their
own regex compiler when there are two high-class modern libraries readily
available under liberal licenses.  My guess is most implementations that
want to support this will go the easy way and use (or at least start with)
this SRFI's reference implementation.

> I don't have a sense for the value of benefit (2), maybe it is enough to 
> make the SRE syntax worth it. Benefits (1) and (3) don't seem like 
> strong enough arguments to merit requiring the SRE syntax.

I think the benefit of extensibility of the SRE syntax itself may be a
good thing because it encourages experimentation, and doesn't require
the implementor of new features to shoehorn those into an already
extremely-crowded syntax space.  However, it is not the most important
reason I like SREs for day-to-day use.  The most important reason is
pretty mundane: it's because I can use Paredit to manipulate them :)