[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case-mapping, Unicode & internationalisation

This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.



I believe that UPCASE-STRING, DOWNCASE-STRING, and
TITLECASE-STRING belong to a separate domain of 'text processes'
that should be addressed in separate SRFIs. I think that the best approach
in Unicode context is to treat Scheme strings as just arrays of characters
('code points') with no special well-formedness constaints; for example,
it should be legal to have a string consisting of combining characters
with no preceding base character, or a string with low-half surrogate
character not followed by high-half surrogate character.
   A "string" library can contain relatively simple procedures that are
useful in traditional applications; it can also serve as a basis for
building
'text processes' described in the Unicode standard.
   A "char" library can contain procedures to access character properties
described in the Unicode database.
  A "text" library can include the 'text' data type representing well-formed
character sequences and allowing effective implementation of text processes
plus all the necessary primitives to work with this data type.
  A "basic text processes" library can contain specification/implementation
of canonical and compatibility decomposition  based on text primitives.
  Other libraries can implement other text processes, including case
mapping, locating text element boundaries, and collation for different
languages.

-- Sergei