This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Having written many text processing applications in Scheme, I have found plain R5RS poorly suited to "bespoke" parsers; so I use several SLIB modules for string-level infrastructure: (require 'string-search) (require 'string-port) (require 'string-case) (require 'line-i/o) These SRFI-75 discussions dealing with character attributes are leading me to believe that, knowing only one language well, I will be unable to write language-portable programs. But why are we working at the character or even the string level? The first task in writing text-processing programs is to separate the input text into words, punctuation, and whitespace. Could R6RS deal with Unicode text as words, punctuation, and whitespace? Unicode-read port would return a word, punctuation, or whitespace object; or an eof-object. A procedure named `Unicode-write' or `Unicode-display' would write a word, punctuation, or whitespace object to a port. Perhaps `display' can serve this purpose. With case-sensitivity, symbols look like good candidates for word objects. Words as symbols would seem to make multilingual Scheme programs possible. Lists or vectors of these objects would represent multilingual text compactly without character size or encoding issues. As evidence that one can deal with multilanguage text at a high level, consider http://swiss.csail.mit.edu/~jaffer/Scheme.html.jis. Although I know no Japanese, I cobbled together this Japanaese and English page by cutting and pasting from Japanese web pages.