Re: w/ascii and w/unicode

This page is part of the web mail archives of SRFI 115 from before July 7th, 2015. The new archives for SRFI 115 contain all messages, not just those from before July 7th, 2015.

To: Michael Montague <mikemon@xxxxxxxxx>

Subject: Re: w/ascii and w/unicode

From: Alex Shinn <alexshinn@xxxxxxxxx>

Date: Thu, 17 Oct 2013 17:52:37 +0900

Cc: SRFI-115 discussion list <srfi-115@xxxxxxxxxxxxxxxxx>

Delivered-to: srfi-115@xxxxxxxxxxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=iwa6rcCFiIn2ReKuJbgPvnKTNYcVOIsBY59S314g9Zo=; b=o6mLSPuq7MffYg1PmPWeZscZlK8N9jU9J0RqeuWzgMpc4O7Qnrf7KGEpDOvj3QEjFc +acIkpVfsInJ7kiqRlEV2m+zAA+v0QUJkeav9xpcb/cRNsD868CnamkSpfya1/kO526Q jsis1+FC2RcqTQNASUgAd/YqfxLkgBmaq9bM05+uvNpgGxlIGt9VRw4ERLLMmSQzCBvj PUnz3N97VOqzk61Pd3UIDtdOg4/Ke/HPy+0XMTsNQjlIP18gNtxtDbyW11zInxenPqM5 iIlcHfU6v6pAqwTH64oue0+6QvvUDD/XBdS+C6z6u2jkGz+QZ8b26ePKdyE4ROGz940K G4iw==

In-reply-to: <525F5A9C.2040506@gmail.com>

References: <525F5A9C.2040506@gmail.com>

On Thu, Oct 17, 2013 at 12:33 PM, Michael Montague <mikemon@xxxxxxxxx> wrote:

Why are w/ascii and w/unicode necessary? The ascii character set can be used instead.

(regexp-search `(: bos (* ,char-set:ascii) eos) "English") => #<rx-match>
(regexp-search `(: bos (* ,char-set:ascii) eos) "Ελληνική") => #f

You seem to be misunderstanding these operators. They apply

to all contained patterns. The examples you are referring to

are operating on the "letter" character class. You could, if you

wanted, use intersection to restrict individual sets to ASCII-only:

(regexp-search `(: bos (* (& ascii letter)) eos) "English") => #<rx-match>
(regexp-search `(: bos (* (& ascii letter)) eos) "Ελληνική") => #f

(regexp-search `(: bos (* letter) eos) "Ελληνική") => #<rx-match>

However, this needs to be duplicated multiple times if there

are multiple nested csets, and is in fact impossible if the nested

cset is part of an external SRE, e.g. you can't do this here:

(import (only (mystuff regexp-common) rx:plurals))

(regexp-search `(w/ascii ,rx:plurals) "...")

Alex

Follow-Ups:

Re: w/ascii and w/unicode
- From: Michael Montague

References:

w/ascii and w/unicode
- From: Michael Montague