This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.
On Tue, 10 Feb 2004, Tom Lord wrote: >Programmers building global computing environments have need for >certain categories of characters which historically, are of little or >no interest to linguists. One of these categories is comprised of >many of the characters used to write words, whether those characters >are alphabetic, syllabic, or ideographic. Linguistics hasn't given >us a term for that category. Well, actually it has. Linguists call these categories "glyphs" or "graphemes", usually with varying degrees of precision or varying exact meaning depending on the context or speaker. Generally "grapheme" is considered less specific than "glyph", in the sense that many different arrangements of ink on paper (or notches in bark, or carved grooves in stone or whatever) can represent the same grapheme while each minor variation creates a different glyph. For example "A" is a grapheme; A particular example of a lucida ten-point sans-serif A printed on a particular piece of paper is a glyph. But the line between them is blurry. Is the bit-pattern a glyph before it's printed? Is a font containing the patterns for the glyphs that will be printed when it's used composed of glyphs, or graphemes? etc... there's a big fuzzy area between the concrete realization of a particular instance of a grapheme (unambiguously a glyph) and the abstract idea of a minimal written unit of language or printed communication (unambiguously a grapheme) and we tend to redraw the line between them depending on exactly which levels of abstraction we need to distinguish between for a particular application. >In their wisdom (or absense of wisdom) the Unicode consortium chose a >name for this category: they call these characters "letters". They were starved for names. They were already using "grapheme" and "glyph" in ways that didn't allow their reuse. Besides, they wanted to exclude some categories which are graphemes, such as punctuation. > That > _is_ an overloading of the term "letter" -- but it is an overloading > that pervades the Unicode specifications and data tables. For > example, every assigned Unicode codepoint has a property called "the > major class of its General Category". The class of alphabetic, > syllabic, and ideographic characters has the major class "L" (short > for "letter"). The glossary of the Unicode 3.0 specification says: > > Letter. (1) An element of an alphabet. In a broad sense, > includes elements of syllabaries and ideographs. (2) > Informative property of characters that are used to write > words. > > I believe that this "broad sense" meaning of "Letter" is well > engrained in computing and that it _is_ the right term for the > concept that SRFI-52 is attempting to convey. Probably; language changes, and "letter" is rapidly coming to have the broader meaning. It will probably be correct usage no later than "alot of", and in the meantime will probably cause fewer people to grind their teeth. Besides, it's not the first time computer programmers take a word in use and create a restricted, technical definition for it that's not quite exactly the same as the people using it understand. Bear