[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: english names for symbolic SREs

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.



On 11/26/2013 5:01 AM, Alex Shinn wrote:
Traditionally SREs have had the following aliases
allowing the user to choose between brevity and
self-description:

From SCSH:

  | or
  & and
  : seq
 
From IrRegex (in this case introducing a new short form):

  $ submatch
  => submatch-named

For consistency Michael Montague suggested all
SREs have a short and long form.  John Cowan
suggests the following names:

 ? optional
 * zero-or-more
 + one-or-more
 >= at-least
 = exactly
 ** repeated
 ?? non-greedy-optional
 *? non-greedy-zero-or-more
 **? non-greedy-repeated

For the cset-sres we'd also need:

  / char-range (or cset-range?)
  - difference (or diff?)
  ~ complement (or not?)

I would suggest not introducing new short forms
of existing long names.  Comments welcome, but
if there are no objections I'll go with this.

-- 
Alex


I have an alternative suggestion for english names. After thinking about it lots more, I am not sure that there needs to be a one-to-one match between short names and english names, just the same expressive power.

How about 'maybe', 'greedy', and 'non-greedy':

? maybe
* (greedy 0 <sre> ...)
+ (greedy 1 <sre> ...)
>= (greedy <n> <sre> ...)
= (greedy <n> <n> <sre> ...)
** (greedy <n> <m> <sre> ...)
?? (non-greedy 0 1 <sre> ...)
*? (non-greedy 0 <sre> ...)
**? (non-greedy <n> <m> <sre> ...)

There are only three names to remember and, at least to me, they strongly suggest what they do.

I also suggest changing 'submatch-named' to just 'named' and adding or changing 'submatch' to 'indexed'.

Here is an example I took from an earlier message of Alex's. I munged it into a single regular _expression_; blame me for any errors. I am hoping to make time this weekend to find and convert more examples of SREs to english names for us to look at.

I find both versions using english names to be more readable. I like 'greedy' etc because there are less different operators, the names are shorter, and the repetition operators stand out. I like having 'maybe' even though (greedy 0 1 ...) would work; an optional match does not feel like repetition to me. The only odd ball is that a non-greedy maybe needs to be written (non-greedy 0 1 ...).

---- Using existing SRE short names ----
;; YYYY-MM-DD family
(w/nocase
    ;; Year
     (=> year (= 4 digit)) (: (* space) (? ("-_,;:/")) (* space))
    ;; Month
    (=> mon (= 2 digit)) (: (* space) (? ("-_,;:/")) (* space))
    ;; Day
    (=> day (= 2 digit))
    ;; Time
    (? (? "T") (: (* space) (? ("-_,;:/")) (* space))
        ;; Hour
        (=> hour (= 2 digit)) (: (* space) (? ("-_,;:/")) (* space))
        ;; Minute
        (=> min (= 2 digit)) (: (* space) (? ("-_,;:/")) (* space))
        ;; Second
        (=> sec (= 2 digit))
        ;; Timezone
            (? (: (* space) (? ("-_,;:/")) (* space))
                (=> tz (or (: ("+-") (= 2 digit) ("013") ("05"))
                (: word "/" word) (: (? "(") (= 3 alpha) (? ")")))))))

;; ---- Using the english names that I am suggesting ----
;; YYYY-MM-DD family
(w/nocase
    ;; Year
     (named year (greedy 4 4 digit)) (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
    ;; Month
    (named mon (greedy 2 2 digit)) (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
    ;; Day
    (named day (greedy 2 2 digit))
    ;; Time
    (maybe (maybe "T") (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
        ;; Hour
        (named hour (greedy 2 2 digit)) (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
        ;; Minute
        (named min (greedy 2 2 digit)) (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
        ;; Second
        (named sec (greedy 2 2 digit))
        ;; Timezone
            (maybe (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
                (named tz (or (seq ("+-") (greedy 2 2 digit) ("013") ("05"))
                (seq word "/" word) (seq (maybe "(") (greedy 3 3 alpha) (maybe ")")))))))

;; ---- Using the english names that John proposed ----
;; YYYY-MM-DD family
(w/nocase
    ;; Year
     (submatch-named year (exactly 4 digit))
    (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
    ;; Month
    (submatch-named mon (exactly 2 digit))
    (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
    ;; Day
    (submatch-named day (exactly 2 digit))
    ;; Time
    (optional (optional "T") (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
        ;; Hour
        (submatch-named hour (exactly 2 digit))
        (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
        ;; Minute
        (submatch-named min (exactly 2 digit))
        (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
        ;; Second
        (submatch-named sec (exactly 2 digit))
        ;; Timezone
            (optional (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
                (submatch-named tz (or (seq ("+-") (exactly 2 digit) ("013") ("05"))
                (seq word "/" word) (seq (optional "(") (exactly 3 alpha) (optional ")")))))))