This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.
Alex Shinn has made two suggestions about terminology. He objects to the use I've made of the words "letter" and "ideograph". Both words are used in the SRFI-52 draft. The word "letter" is used in a proposed procedure name, CHAR-LETTER?. Alex writes: > "Letter," by all standard definitions and consistent with > Unicode usage, specifically refers an element of an alphabet. > It therefore would not apply to syllabic or ideographic > characters. Programmers building global computing environments have need for certain categories of characters which historically, are of little or no interest to linguists. One of these categories is comprised of many of the characters used to write words, whether those characters are alphabetic, syllabic, or ideographic. Linguistics hasn't given us a term for that category. There is an easy example of why such a category is desirable in computing. Let's suppose that I'm going to specify the lexical syntax of identifiers in a programming language. As part of that specification, I'll need to identify this category. (For an example, see "Unicode Technical Report #31: Identifier and Pattern Syntax", http://www.unicode.org/reports/tr31/tr31-2.html) In their wisdom (or absense of wisdom) the Unicode consortium chose a name for this category: they call these characters "letters". That _is_ an overloading of the term "letter" -- but it is an overloading that pervades the Unicode specifications and data tables. For example, every assigned Unicode codepoint has a property called "the major class of its General Category". The class of alphabetic, syllabic, and ideographic characters has the major class "L" (short for "letter"). The glossary of the Unicode 3.0 specification says: Letter. (1) An element of an alphabet. In a broad sense, includes elements of syllabaries and ideographs. (2) Informative property of characters that are used to write words. I believe that this "broad sense" meaning of "Letter" is well engrained in computing and that it _is_ the right term for the concept that SRFI-52 is attempting to convey. Alex also writes: > "Ideograph" applied to all Han characters is technically > incorrect. Linguists prefer the term "sinogram" which refers to > Chinese-derived characters. "Sinogram" fits all uses being > applied to the term "ideograph" in these discussions (at least > until Unicode adds hieroglyphs). Since the usage of ideograph > is fairly ubiquitous, however, it may not be worth fighting it. I have an intellectual curiosity about why you say that "ideograph" is inaccurate. I do note that Han characters are not the only ideographic letters encoded in Unicode -- although I'm not sure there is a huge future in writing Scheme programs whose identifiers are spelled using the Linear B script :-) -t