[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp and valid-sre?

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

On Wed, Nov 27, 2013 at 3:00 AM, Michael Montague <mikemon@xxxxxxxxx> wrote:
On 11/26/2013 6:17 AM, Peter Bex wrote:
On Tue, Nov 26, 2013 at 09:44:27PM +0900, Alex Shinn wrote:
On Tue, Nov 26, 2013 at 12:34 PM, Michael Montague <mikemon@xxxxxxxxx>wrote:

Why can the procedure 'regexp' be called with an already compiled <re>
which is just returned?
Convenience, I'd say.  That way you can create modules which have an
interface that accepts either SREs or regexp objects (like irregex does),
having it automatically compile SREs.

Why is the procedure 'valid-sre?' necessary? You could just call 'regexp'
and use 'guard' to check for any errors.

Indeed, in fact `valid-sre?' could be defined as:

   (define (valid-sre? x)
     (guard (else (exn #f)) (regexp x)))

Whether you want to test in advance or catch errors
after the fact is a matter of personal style.
And in some implementations compiling might be a lot more expensive than
simply checking, and if you're just providing on-the-fly feedback to a
user while building a regex dynamically (for example), it might be better
or more efficient to use valid-sre? instead of compiling.

I'm sure that in Irregex at least the DFA compilation is much more
expensive for complex regexes than a simple "is it valid"-type check
would be.
I don't think that these are strong arguments for having 'valid-sre?'. An implementation for which compiling is expensive, could easily internally do the "is it valid"-type check before compiling. Having it in the interface adds no functionality that is not already easily available.

DFA compilation can actually require exponential time
(and space) in the worst-case, so performance is a concern

As a use-case consider something like (apologies if this
is somewhat contrived):

  (define (search db re)
    (if (not (valid-sre? re))
        (log-bad-sre re)
        (filter (lambda (s) (regexp-matches? re s)) (lookup db field))))

where 1) we're part of a running server and don't want to throw
uncaught exceptions, and 2) we don't want to make the lookup call
at all if the re is not valid.  The alternative satisfying these conditions
is to inline the above definition of `valid-sre?':

  (define (search db re)
    (if (guard (else (exn #f)) (regexp re) #t)
        (log-bad-sre re)
        (filter (lambda (s) (regexp-matches? re s)) (lookup db field))))

This is longer and harder to read.  I prefer having the `valid-sre?'
utility here.

I propose dropping 'valid-sre?'.

I and all the others in the discussion seem to disagree with you.