[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case-mapping, Unicode & internationalisation



I believe that UPCASE-STRING, DOWNCASE-STRING, and
TITLECASE-STRING belong to a separate domain of 'text processes'
that should be addressed in separate SRFIs. I think that the best approach
in Unicode context is to treat Scheme strings as just arrays of characters
('code points') with no special well-formedness constaints; for example,
it should be legal to have a string consisting of combining characters
with no preceding base character, or a string with low-half surrogate
character not followed by high-half surrogate character.
   A "string" library can contain relatively simple procedures that are
useful in traditional applications; it can also serve as a basis for
building
'text processes' described in the Unicode standard.
   A "char" library can contain procedures to access character properties
described in the Unicode database.
  A "text" library can include the 'text' data type representing well-formed
character sequences and allowing effective implementation of text processes
plus all the necessary primitives to work with this data type.
  A "basic text processes" library can contain specification/implementation
of canonical and compatibility decomposition  based on text primitives.
  Other libraries can implement other text processes, including case
mapping, locating text element boundaries, and collation for different
languages.

-- Sergei