[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

text processes vs. string procedures

This page is part of the web mail archives of SRFI 13 from before July 7th, 2015. The new archives for SRFI 13 contain all messages, not just those from before July 7th, 2015.



I agree with almost all of Sergei's msg.

- Basic string procs should *not* require textual well-formedness in a Unicode
  world. A string full of accents and umlauts and cedillas with no preceding
  base or start character is still a legal string.

- Full Unicode support will certainly require other procedures not in the 
  SRFI-13 spec. Sergei's examples of canonical & compatibility decomposition
  and composition are good ones. These should go in a Unicode-specific
  library, which is not the goal of SRFI-13.

- We also certainly need to do a new char library. Or perhaps a pair of them:
  one generic one, and one for Unicode-specific things. 

- However, I think case-mapping and string-comparison are basic things, and
  they can be given a generic, portable definition independent of the
  underlying character encoding. Case-mapping does *not* require strings to be
  well-formed text. ASCII, Latin-1 and Unicode all provide a clear,
  language-independent definitions of this operation.

  I don't want the string library to be minimal. I want it to be useful.
  People -- many of whom currently program with Latin-1 or ASCII Schemes --
  case-map and compare strings frequently. These operations can be provided
  with an API which is portable across ASCII, Latin-1 and Unicode. So there's
  no barrier here.