This page is part of the web mail archives of SRFI 91 from before July 7th, 2015. The new archives for SRFI 91 contain all messages, not just those from before July 7th, 2015.
Thomas Bushnell BSG <firstname.lastname@example.org> writes: >> Then it's impossible to implement a UTF-8 encoder. There is an >> infinite number of potential characters, and there is no way to >> examine what a given character means. > > What exactly makes it impossible? There are an infinity of possible > integers, and this hasn't hampered the implementation of <. Integers support arithmetic. Individual bits or larger digits of an integer can be counted and examined. You can index dictionaries by integers. Any reasonable function on integers can be expressed by composing primitive operations. What operations would your characters support? I guess that operations similar to today's strings, i.e. determining the length, extracing individual code points, and some way to build them from code points (e.g. making a singleton from the given code point and appending a code point to a character, assuming that characters are immutable). With some rules of normalization; if NFC and NFD are indistinguishable, then extracting individual code points would not necessarily yield code points used to construct a character. But wouldn't it be simpler to just use strings of code points for what you would use characters? Strings of code points are needed anyway when we work on a lower level, e.g. when we care whether the output is NFC or NFD. So why don't just make a library which provides iteration over strings using substrings representing characters, normalization etc. - the same functionality, but without calling some groups of code points "characters"? >> established practice of using code points or even lower level code >> units as Scheme characters. > > There is no "established practice" of doing this. The established > practice is to pretend that code points and abstract characters are > the same. This is exactly what I said. The established practice is to work in terms of code points or even lower level. You want to call characters of this simplified view using the more formal term "code points", and to call some strings of well-formed sequences of combining characters "characters". Apart from changing names, what does it accomply? Changing practice just to have nicer procedure names is a weak excuse. -- __("< Marcin Kowalczyk \__/ email@example.com ^^ http://qrnik.knm.org.pl/~qrczak/