SRFI 238: Codesets

by Lassi Kortela

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-238@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2022-10-24
Draft #1 published: 2022-11-03
Draft #2 published: 2022-12-13
Draft #3 published: 2023-01-03
Finalized: 2023-01-16

Abstract

Many programming interfaces rely on a set of condition codes where each code has a numeric ID, a mnemonic symbol, and a human-readable message. This SRFI defines a facility to translate between numbers and symbols in a codeset and to fetch messages by code. Examples are given using the Unix errno and signal codesets.

Rationale
Specification
Examples
Implementation

Rationale

Motivation

Instead of having a separate lookup interface for each codeset, with trivial variations between them, it is easier to have just one.

Goals

The goals of this SRFI are as follows.

Provide a comfortable user interface.
Support a wide range of implementation strategies and environments.
Impose a minimum of fuss on the implementer.

The SRFI defines a minimal foundation which can be supplied by even the most featherweight Scheme implementation. More features can be built on top of this foundation. In particular, this SRFI does not provide a way for users to define their own codesets. That's something a Scheme implementation with a foreign function interface should provide, and which should perhaps be standardized in another SRFI.

Convenience

The interface given in this SRFI prioritizes convenience and simplicity for the caller. Common codesets can be referred to using symbols; the user does not have to fetch a special codeset object before making queries. The interface is lenient, returning #f instead of raising an exception whenever information is unavailable.

Error handling is typically regarded as a chore into which programmers do not like to invest extra time. It is hoped that a forgiving interface encourages Scheme programmers to add user-friendly error reporting in more places.

Enumerations and predicates

It is not appropriate to define a standard list of codes for each codeset since the full set of values encountered in the wild is not easily fixed. For many codesets, especially those retrieved from a low-level language using a groveler, there is no reliable way to enumerate all the codes that might be manifest on a given system. New codes are often added by new versions of software, new editions of standards, and implementations which extend the standards.

For example, assume a hypothetical C library called libfoo whose version 1.1 adds the error code FOO_LOST_MOJO. A Scheme system built against libfoo version 1.0 would say that (codeset-symbol? 'libfoo 'FOO_LOST_MOJO) is false. But if the backward-compatible libfoo 1.1 is dynamically linked to that Scheme system, the claim becomes misleading.

For that reason the predicates codeset-symbol? and codeset-number? are not provided.

In a similar vein, some users would prefer that the code symbols come from a pre-defined enumeration. Such enumerations can be created in R⁶RS, for instance. The author's opinion is that Scheme permits users to easily create their own enumerations from whichever symbols they wish. Each application or library wishing to use an enumeration should list the codes that it needs.

Survey of prior art

This survey is limited to errno wrappers.

Gauche has two hash tables for errno lookup.
Chicken has an errno module exporting the constants.
GNU CLISP has an errno function that accepts either a number or a symbol and returns the other. It also has a strerror function.
The Standard ML Basis Library defines OS.syserror as an abstract data type. In SML/NJ the concrete type is an integer.
Python has an errno module exporting the constants and a lookup table.

There are sure to be countless similar interfaces in other systems.

Specification

Codeset objects

The procedures in this SRFI take a codeset argument. It can be either a symbol or some kind of implementation-defined object. It is an error to pass a codeset argument of some other type.

It is valid to pass a codeset symbol that does not refer to a codeset known to the implementation. In this case the unknown codeset is treated like an empty codeset. The codeset? procedure can be used to differentiate between unknown and known empty codesets.

The data structure used to implement a codeset is unspecified. It is also unspecified whether information about a codeset is loaded on demand or ahead of time. The codesets provided by an implementation can mix different implementation strategies.

Codeset registry

Symbols denoting a codeset are tracked in the Scheme Registry.

Degenerate cases

The implementation is permitted to supply the following.

Empty codesets.
Codesets for which no messages are known.
Codesets for which only symbols are known. This still lets the user list the symbols and look up messages.
Codesets for which only numbers are known. This still lets the user look up messages.

Procedures

(codeset? object) => boolean

Return #t if object is either

a symbol naming a codeset known to the implementation, or
an implementation-defined codeset object.

Else return #f.

(codeset-symbols codeset) => symbol-list

Return a list of zero or more symbols in arbitrary order with no duplicates.

The list should comprise all symbols in codeset. In many cases there is no way to reliably enumerate all extant symbols. In those cases the list should contain as many as are known.

It’s up to the implementation whether the result list is fresh or an old list is re-used. It is an error for the caller to mutate the result.

(codeset-symbol codeset code) => symbol or #f

If code is an integer matching a code in codeset, return the corresponding symbol.

If code is some other exact integer, return #f.

If code is a symbol, return it as-is.

It is an error if code is something other than a symbol or an exact integer.

(codeset-number codeset code) => integer or #f

If code is a symbol matching a code in codeset, return the corresponding integer value.

If code is some other symbol, return #f.

If code is an exact integer, return it as-is.

It is an error if code is something other than a symbol or an exact integer.

(codeset-message codeset code) => string or #f

If code is a symbol or an integer matching a code in codeset, and a message is known for that code, return the message as a string.

If no message is known to be associated with code, return #f.

It is an error if code is something other than a symbol or an exact integer.

The implementation is free to return a message in any language. Extensions of this procedure should add an optional third argument for the locale. It's permitted to export such an extended procedure from this SRFI's library, but users of this SRFI shall not expect the extended argument to be available.

It’s up to the implementation whether the result string is fresh. It is an error for the caller to mutate the result.

Examples

Basic usage

Implementations that provide the errno codeset are expected to behave as follows.

(codeset? 'errno) => #t

(codeset-symbols 'errno) => (EPERM ENOENT ESRCH ...)

(codeset-symbol 'errno 'EPERM) => EPERM
(codeset-symbol 'errno 1) => EPERM

(codeset-number 'errno 'EPERM) => 1
(codeset-number 'errno 1) => 1

(codeset-message 'errno 'EPERM) => "Operation not permitted"
(codeset-message 'errno 1) => "Operation not permitted"

The symbols may be listed in a different order. In some cases the number may differ and the message may be in a language other than English.

Promoting numbers to symbols

The following idiom promotes numeric error codes to symbolic ones where possible. For unknown codes it retains the number so that at least some information about the error is preserved.

(let ((code (or (codeset-symbol 'errno code)
                code)))
  (display code))

Enumerations

The following is a way to use R⁶RS enumerations with codesets.

(define-enumeration errno
  (EPERM
   ENOENT
   ESRCH
   EINTR
   EIO
   ENXIO
   E2BIG
   ENOEXEC
   EBADF
   ECHILD)
  errno-set)

(codeset-message 'errno (errno ENOENT))

The following is a simple way to roll your own enumeration.

(define-syntax define-symbols
  (syntax-rules ()
    ((define-symbols symbol ...)
     (begin (define symbol 'symbol) ...))))

(define-symbols
  EPERM
  ENOENT
  ESRCH
  EINTR
  EIO
  ENXIO
  E2BIG
  ENOEXEC
  EBADF
  ECHILD)

Now typos such as (codeset-message 'errno ECHILE) will be caught.

Introspection

When learning an interface, programmers often like to get an overview of the territory. The tools in this SRFI are accessible from the REPL, providing introspection capabilities to help.

First, let's define a helper procedure.

(define (sorted-symbols set)
  (list-sort (lambda (code1 code2)
               (< (codeset-number set code1)
                  (codeset-number set code2)))
             (codeset-symbols set)))

We can list all the errno symbols in numerical order. This places the oldest codes first, giving us a window into the evolution of Unix. The first 35 codes have to do with files and terminals, with pipes and non-blocking I/O added relatively late. Then comes the socket support.

(sorted-symbols 'errno)
=> (EPERM ENOENT ESRCH EINTR EIO ENXIO E2BIG ENOEXEC EBADF ECHILD
    EDEADLK ENOMEM EACCES EFAULT ENOTBLK EBUSY EEXIST EXDEV ENODEV
    ENOTDIR EISDIR EINVAL ENFILE EMFILE ENOTTY ETXTBSY EFBIG
    ENOSPC ESPIPE EROFS EMLINK EPIPE EDOM ERANGE EAGAIN EWOULDBLOCK
    EINPROGRESS EALREADY ENOTSOCK EDESTADDRREQ EMSGSIZE EPROTOTYPE
    ENOPROTOOPT EPROTONOSUPPORT ESOCKTNOSUPPORT EPFNOSUPPORT
    EAFNOSUPPORT EADDRINUSE EADDRNOTAVAIL ENETDOWN ....)

Likewise, the earliest 30 Windows API error codes.

(take (sorted-symbols 'windows) 30)
=> (ERROR_SUCCESS ERROR_INVALID_FUNCTION ERROR_FILE_NOT_FOUND
    ERROR_PATH_NOT_FOUND ERROR_TOO_MANY_OPEN_FILES ERROR_ACCESS_DENIED
    ERROR_INVALID_HANDLE ERROR_ARENA_TRASHED ERROR_NOT_ENOUGH_MEMORY
    ERROR_INVALID_BLOCK ERROR_BAD_ENVIRONMENT ERROR_BAD_FORMAT
    ERROR_INVALID_ACCESS ERROR_INVALID_DATA ERROR_OUTOFMEMORY
    ERROR_INVALID_DRIVE ERROR_CURRENT_DIRECTORY ERROR_NOT_SAME_DEVICE
    ERROR_NO_MORE_FILES ERROR_WRITE_PROTECT ERROR_BAD_UNIT
    ERROR_NOT_READY ERROR_BAD_COMMAND ERROR_CRC ERROR_BAD_LENGTH
    ERROR_SEEK ERROR_NOT_DOS_DISK ERROR_SECTOR_NOT_FOUND
    ERROR_OUT_OF_PAPER ERROR_WRITE_FAULT)

We can also find the maximum Windows API error code. This gives us an idea of the complexity of the API, and tells us how many bits are needed to store codes.

(let ((set 'windows))
   (fold max 0 (map (lambda (x) (codeset-number set x))
                    (codeset-symbols set))))
=> 2250

A network programmer can list the HTTP 4xx codes, showing all the ways in which a request can fail.

(let ((set 'http))
   (map (lambda (code)
          (list (codeset-number set code)
                (codeset-message set code)))
        (filter (lambda (code)
                  (<= 400 (codeset-number set code) 499))
                (sorted-symbols set))))
=> ((400 "Bad Request") (401 "Unauthorized") (402 "Payment Required")
    (403 "Forbidden") (404 "Not Found") (405 "Method Not Allowed")
    (406 "Not Acceptable") (407 "Proxy Authentication Required")
    (408 "Request Timeout") (409 "Conflict") (410 "Gone")
    (411 "Length Required") (412 "Precondition Failed")
    (413 "Request Entity Too Large") (414 "Request URI Too Long")
    (415 "Unsupported Media Type") (416 "Requested Range Not Satisfiable")
    (417 "Expectation Failed") (418 "I'm a teapot")
    (421 "Misdirected Request") (422 "Unprocessable Entity")
    (423 "Locked") (424 "Failed Dependency") (425 "Too Early")
    (426 "Upgrade Required") (428 "Precondition Required")
    (429 "Too Many Requests") (431 "Request Header Fields Too Large")
    (451 "Unavailable For Legal Reasons"))

Implementation

An implementation for Gambit and Gauche is available in this SRFI’s repository. It provides two codesets, errno and signal, based on the codes of those Unix interfaces retrieved from the C programming language.

For the sake of curiosity, a null implementation and a Common Lisp implementation are also supplied.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Arthur A. Gleckler

SRFI 238: Codesets

Status

Abstract

Table of contents

Rationale

Motivation

Goals

Convenience

Enumerations and predicates

Survey of prior art

Specification

Codeset objects

Codeset registry

Degenerate cases

Procedures

Examples

Basic usage

Promoting numbers to symbols

Enumerations

Introspection

Implementation

Copyright