[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SRFI-115 issues

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.



On Mon, Oct 21, 2013 at 6:37 AM, John Cowan <cowan@xxxxxxxxxxxxxxxx> wrote:
Alex Shinn scripsit:

> I've kept most IrRegex extensions, but made many of the non-POSIX
> ones optional, designated by the regexp-extended feature, and backref
> specifically gets its own feature regexp-backrefs.

This troubles me.  It leaves things too much up to the implementation,
and not enough flexibility for the user.  These extensions work only if
you have a backtracking NFA, which is inherently less efficient.  In
order to provide both efficiency and power, the implementation would
have to provide both an NFA (to be used in the general case) and a DFA
or Thompson-NFA (to be used if the extensions are not needed).  This is
what Perl does, but Perl is a rag-bag by nature.

I'd say: leave these things out of the main library, but add another
library that provides them but using the same API.  This way, the user
can load (srfi 115) or (srfi 115 extended) and get the most suitable
engine.  Of course, they can be the same engine if the implementer
doesn't care that much about speed.  If the user needs to load both,
using the R6RS/R7RS prefix feature makes both APIs available.

I think this is a recipe for confusion.  APIs will want to
allow regexps as parameters - which library should they
choose?

I actually need to separate the features better because
things like "non-greedy" repetitions can actually be
supported by non-backtracking implementations, "if" is
just a shortcut for "or" with a look-ahead, and I have to
think about whether it's possible for atomic/commit to
be supported without backtracking.

Ultimately with a little effort everything can be supported.
One trick to support backreferences in DFA impls is to
replace them with .* and use post-processing to verify.
So it's more a matter of what's readily available, not what's
possible.

That said, I _really_ think these should be a single
library.  If for whatever reason using two underlying
implementations is more practical, so be it - that's what
IrRegex does.  Backtracking implementations are
easy to write.

Also note, in addition to the features you can programmatically
use `valid-sre?'.

> regexp->sre is frequently requested in IrRegex. It is useful and the
> only argument against it is that it would require more memory for
> compiled regexps (linearly more for most implementations), but I'll
> wait to see if it's requested in the discussion.

I think it's an important thing to have.  The wording should allow for
either caching within the regexp object or decompilation, and should
warn that caching may produce a space leak.

I will include this in the next draft with a warning
that the SRE will be functionally equivalent but
may not be equal? to the SRE the regexp was
compiled from.

-- 
Alex