[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: words, punctuation, and whitespace
Aubrey Jaffer <agj@xxxxxxxxxxxx> writes:
> The first task in writing text-processing programs is to separate the
> input text into words, punctuation, and whitespace. Could R6RS deal
> with Unicode text as words, punctuation, and whitespace?
>
> Unicode-read port
>
> would return a word, punctuation, or whitespace object; or an
> eof-object.
An interesting idea. But I surely hope that you aren't assuming that
text consists of a bunch of words separated by whitespace and/or
punctuation. In some languages there is essentially no whitespace.
(For example, this is how Japanese books are traditionally printed.)