[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

terminology

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.



Alex Shinn has made two suggestions about terminology.   He objects
to the use I've made of the words "letter" and "ideograph".  Both
words are used in the SRFI-52 draft.   The word "letter" is used in a
proposed procedure name, CHAR-LETTER?.

Alex writes:

    > "Letter," by all standard definitions and consistent with
    > Unicode usage, specifically refers an element of an alphabet.
    > It therefore would not apply to syllabic or ideographic
    > characters.

Programmers building global computing environments have need for
certain categories of characters which historically, are of little or
no interest to linguists.  One of these categories is comprised of
many of the characters used to write words, whether those characters
are alphabetic, syllabic, or ideographic.   Linguistics hasn't given 
us a term for that category.

There is an easy example of why such a category is desirable in
computing.  Let's suppose that I'm going to specify the lexical syntax
of identifiers in a programming language.  As part of that
specification, I'll need to identify this category.  (For an example,
see "Unicode Technical Report #31: Identifier and Pattern Syntax",
http://www.unicode.org/reports/tr31/tr31-2.html)

In their wisdom (or absense of wisdom) the Unicode consortium chose a
name for this category: they call these characters "letters".  That
_is_ an overloading of the term "letter" -- but it is an overloading
that pervades the Unicode specifications and data tables.  For
example, every assigned Unicode codepoint has a property called "the
major class of its General Category".  The class of alphabetic,
syllabic, and ideographic characters has the major class "L" (short
for "letter").  The glossary of the Unicode 3.0 specification says:

	Letter. (1) An element of an alphabet.  In a broad sense,
        includes elements of syllabaries and ideographs.  (2)
        Informative property of characters that are used to write 
        words.

I believe that this "broad sense" meaning of "Letter" is well
engrained in computing and that it _is_ the right term for the 
concept that SRFI-52 is attempting to convey.


Alex also writes:

    > "Ideograph" applied to all Han characters is technically
    > incorrect.  Linguists prefer the term "sinogram" which refers to
    > Chinese-derived characters.  "Sinogram" fits all uses being
    > applied to the term "ideograph" in these discussions (at least
    > until Unicode adds hieroglyphs).  Since the usage of ideograph
    > is fairly ubiquitous, however, it may not be worth fighting it.

I have an intellectual curiosity about why you say that "ideograph"
is inaccurate.

I do note that Han characters are not the only ideographic letters
encoded in Unicode --  although I'm not sure there is a huge future in
writing Scheme programs whose identifiers are spelled using the Linear
B script :-)


-t