218: Unicode Numerals

by John Cowan (text) and Arvydas Silanskas (implementation)

Status

This SRFI is currently in withdrawn status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-218@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

author's summary of reasons for withdrawal:

  1. It only handles a limited number of cases, basically those where localized numbers are completely isomorphic to the European digits 0-9.
  2. It is not suitable as a building block for localized numeric parsers and formatters.

Abstract

These procedures allow the creation and interpretation of numerals using any set of Unicode digits that support positional notation.

Rationale

Although the positional decimal numeral system most widely used to write numbers is often called the Hindu-Arabic numeral system, the form of the digits 0-9 that evolved in Europe and are now used worldwide is not their only possible representation. In particular, it is not usually used with either the various Indic scripts or the Arabic script. The digits that are used instead are functionally identical, but their shape is different, and each one has a different set of digit characters in Unicode. For example, the number 12345 is written as ۱۲۳۴۵ in Eastern Arabic digits (used with Persian, Urdu, and other languages), and १२३४५ in Devanagari digits (used with Hindi and other languages).

Although R7RS-small Scheme permits non-European digits to be used in identifiers, there is very little support for using them in numbers. The digit-value procedure allows converting a single decimal digit character to its numeric value: thus (digit-value #\५) => 5, because ५ is Devanagari digit 5. (The digit can be specified as #\x096B instead.) This SRFI allows numbers of arbitrary types to be converted from and to any digit set.

No support is provided for bases other than 10, because such bases are rarely used with any non-European digit set, and because it is unclear what characters should be used to represent digits greater than 9. Likewise, there is no support for numerals that are not positional, such as Roman numerals or traditional Tamil numerals, which have nothing corresponding to 0 but do have numerals for 10, 1000, and 1000, so that 2718 would be ௨௲௭௱௰௮, literally "2 1000 7 100 10 8".

Specification

(number->numeral z zero)
(numeral->number string zero)

These procedures behave identically to number->string and string->number from the (scheme base) library, except that where number->string generates, and string->number accepts, a 0 character, these procedures generate and accept a character equal to zero. Similarly, the successor (in Unicode ordering) of zero is generated and accepted in place of 1, the successor of the successor of zero is generated and accepted in place of 2, and so on.

If string->number would return #f on string, so does numeral->number.

It is an error if zero is not one of the characters with Unicode general category equal to Nd (decimal digit) and numeric value equal to 0.

Examples:

(number->numeral 3.1415 #\x9E6) => "৩.১৪১৫"    ; BENGALI DIGIT ZERO
(numeral->number "๓๕๕/๑๑๓" #\xE50) => 355/113   ; THAI DIGIT ZERO

Implementation

The sample implementation is found in the repository of this SRFI.

Acknowledgements

Thanks to the Unicode Consortium, who made this SRFI possible, and to the participants on the SRFI mailing list.

© 2020 John Cowan and Arvydas Silanskas.

.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler