[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: words, punctuation, and whitespace

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: Aubrey Jaffer <agj@xxxxxxxxxxxx>
Subject: Re: words, punctuation, and whitespace
From: Thomas Bushnell BSG <tb@xxxxxxxxxx>
Date: Tue, 19 Jul 2005 20:09:22 -0700
Cc: srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <20050720025104.DE1D21B77B4@xxxxxxxxxxxxxxxx> (Aubrey Jaffer's message of "Tue, 19 Jul 2005 22:51:04 -0400 (EDT)")
References: <20050720025104.DE1D21B77B4@xxxxxxxxxxxxxxxx>
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

Aubrey Jaffer <agj@xxxxxxxxxxxx> writes:

> The first task in writing text-processing programs is to separate the
> input text into words, punctuation, and whitespace.  Could R6RS deal
> with Unicode text as words, punctuation, and whitespace?
>
>   Unicode-read port
>
> would return a word, punctuation, or whitespace object; or an
> eof-object.

An interesting idea.  But I surely hope that you aren't assuming that
text consists of a bunch of words separated by whitespace and/or
punctuation.  In some languages there is essentially no whitespace.
(For example, this is how Japanese books are traditionally printed.)

References:
- words, punctuation, and whitespace
  - From: Aubrey Jaffer

Prev by Date: Re: on waste-of-time arguments....
Next by Date: Re: the discussion so far
Previous by thread: words, punctuation, and whitespace
Next by thread: Re: words, punctuation, and whitespace
Index(es):
- Date
- Thread