[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: english names for symbolic SREs



On 11/26/2013 5:01 AM, Alex Shinn wrote:
Traditionally SREs have had the following aliases
allowing the user to choose between brevity and
self-description:

From SCSH:

  | or
  & and
  : seq
 
From IrRegex (in this case introducing a new short form):

  $ submatch
  => submatch-named

For consistency Michael Montague suggested all
SREs have a short and long form.  John Cowan
suggests the following names:

 ? optional
 * zero-or-more
 + one-or-more
 >= at-least
 = exactly
 ** repeated
 ?? non-greedy-optional
 *? non-greedy-zero-or-more
 **? non-greedy-repeated

For the cset-sres we'd also need:

  / char-range (or cset-range?)
  - difference (or diff?)
  ~ complement (or not?)

I would suggest not introducing new short forms
of existing long names.  Comments welcome, but
if there are no objections I'll go with this.

-- 
Alex


I have an alternative suggestion for english names. After thinking about it lots more, I am not sure that there needs to be a one-to-one match between short names and english names, just the same expressive power.

How about 'maybe', 'greedy', and 'non-greedy':

? maybe
* (greedy 0 <sre> ...)
+ (greedy 1 <sre> ...)
>= (greedy <n> <sre> ...)
= (greedy <n> <n> <sre> ...)
** (greedy <n> <m> <sre> ...)
?? (non-greedy 0 1 <sre> ...)
*? (non-greedy 0 <sre> ...)
**? (non-greedy <n> <m> <sre> ...)

There are only three names to remember and, at least to me, they strongly suggest what they do.

I also suggest changing 'submatch-named' to just 'named' and adding or changing 'submatch' to 'indexed'.

Here is an example I took from an earlier message of Alex's. I munged it into a single regular _expression_; blame me for any errors. I am hoping to make time this weekend to find and convert more examples of SREs to english names for us to look at.

I find both versions using english names to be more readable. I like 'greedy' etc because there are less different operators, the names are shorter, and the repetition operators stand out. I like having 'maybe' even though (greedy 0 1 ...) would work; an optional match does not feel like repetition to me. The only odd ball is that a non-greedy maybe needs to be written (non-greedy 0 1 ...).

---- Using existing SRE short names ----
;; YYYY-MM-DD family
(w/nocase
    ;; Year
     (=> year (= 4 digit)) (: (* space) (? ("-_,;:/")) (* space))
    ;; Month
    (=> mon (= 2 digit)) (: (* space) (? ("-_,;:/")) (* space))
    ;; Day
    (=> day (= 2 digit))
    ;; Time
    (? (? "T") (: (* space) (? ("-_,;:/")) (* space))
        ;; Hour
        (=> hour (= 2 digit)) (: (* space) (? ("-_,;:/")) (* space))
        ;; Minute
        (=> min (= 2 digit)) (: (* space) (? ("-_,;:/")) (* space))
        ;; Second
        (=> sec (= 2 digit))
        ;; Timezone
            (? (: (* space) (? ("-_,;:/")) (* space))
                (=> tz (or (: ("+-") (= 2 digit) ("013") ("05"))
                (: word "/" word) (: (? "(") (= 3 alpha) (? ")")))))))

;; ---- Using the english names that I am suggesting ----
;; YYYY-MM-DD family
(w/nocase
    ;; Year
     (named year (greedy 4 4 digit)) (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
    ;; Month
    (named mon (greedy 2 2 digit)) (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
    ;; Day
    (named day (greedy 2 2 digit))
    ;; Time
    (maybe (maybe "T") (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
        ;; Hour
        (named hour (greedy 2 2 digit)) (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
        ;; Minute
        (named min (greedy 2 2 digit)) (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
        ;; Second
        (named sec (greedy 2 2 digit))
        ;; Timezone
            (maybe (seq (greedy 0 space) (maybe ("-_,;:/")) (greedy 0 space))
                (named tz (or (seq ("+-") (greedy 2 2 digit) ("013") ("05"))
                (seq word "/" word) (seq (maybe "(") (greedy 3 3 alpha) (maybe ")")))))))

;; ---- Using the english names that John proposed ----
;; YYYY-MM-DD family
(w/nocase
    ;; Year
     (submatch-named year (exactly 4 digit))
    (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
    ;; Month
    (submatch-named mon (exactly 2 digit))
    (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
    ;; Day
    (submatch-named day (exactly 2 digit))
    ;; Time
    (optional (optional "T") (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
        ;; Hour
        (submatch-named hour (exactly 2 digit))
        (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
        ;; Minute
        (submatch-named min (exactly 2 digit))
        (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
        ;; Second
        (submatch-named sec (exactly 2 digit))
        ;; Timezone
            (optional (seq (zero-or-more space) (optional ("-_,;:/")) (zero-or-more space))
                (submatch-named tz (or (seq ("+-") (exactly 2 digit) ("013") ("05"))
                (seq word "/" word) (seq (optional "(") (exactly 3 alpha) (optional ")")))))))