Re: Effect of string mutation on submatch extraction

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

To: Evan Hanson <evhan@xxxxxxxxxxxxxxx>

Subject: Re: Effect of string mutation on submatch extraction

From: Alex Shinn <alexshinn@xxxxxxxxx>

Date: Fri, 15 Nov 2013 21:12:29 +0900

Cc: SRFI-115 discussion list <srfi-115@xxxxxxxxxxxxxxxxx>

Delivered-to: srfi-115@xxxxxxxxxxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=5w71JHgp+6K/7QYmNcLSOp4Ti6d5xBjsYuuxh5FNUaE=; b=I0oTHWdMLY/x4T2/IfYd12CfYE29eFi/1QV3UVXO2rP/lkV9E+zT6M7ax0wsWQKjul Ecgbna98cLBrS0BF210lIapuzQbSZ2/ZkCq3BDlnETpmcW7BlbHIClxVzjOLuk/7PrNK /VPKMVT4tKDyBNAqjuwqZ9L+5sEfCuKxDDzY2cPpys36HtNo9UHt87QPny19DZ6gQwH+ 37UeuJSykXblQbSh6afesc/qapHL/Ddg9DyLaBGLO8c4TtDxXeC5BhHImXe1tWLg64O4 zu2E4g8BqfuAjYRqwmrngol54Q02DJVbNamr8fwVY1nwitxROOvVrJUBgaufm3turgNR 68Gg==

In-reply-to: <20131115084412.GB9741@capsaicin>

References: <20131115084412.GB9741@capsaicin>

On Fri, Nov 15, 2013 at 5:44 PM, Evan Hanson <evhan@xxxxxxxxxxxxxxx> wrote:

Hi,

After removing the string arguments from the submatch extraction
procedures, should the results of things like the following be
specified?

(define s "abc")
(define m (regexp-search "b" s))
(regexp-match-submatch m 0) ; => "b"
(string-set! s 1 #\B)
(regexp-match-submatch m 0) ; => "b", "B", or undefined?

Chibi returns "B", which I also think is the better option since it's
simpler and probably more efficient, though "b" is arguably nicer for
the user. Either way, is it worth specifying one (even if that's just
"undefined")?

A very good point, I think we should explicitly

state this is undefined.

I had also been planning on noting that the

effects of mutating an SRE passed to `regexp'

is undefined. The cost of making a full copy of

every compilation is too expensive, especially

when considering huge mutable Unicode char-sets.

Alex