253: Data (Type-)Checking

by Artyom Bologov

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-253@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Table of contents

  1. Abstract
  2. Rationale
  3. Specification
    1. (check-arg predicate arg [caller])
    2. (values-checked (predicates ...) values ...)
    3. (check-case value (predicate body ...) ... [(else else-body ...)])
    4. (lambda-checked (args ...) body ...)
    5. (case-lambda-checked ((args ...) body ...) ...)
    6. (define-checked (name args ...) body ...) | (define-checked name predicate value)
    7. (define-record-type-checked type-name (constructor arg-name ...) predicate field ...)
  4. Implementation
    1. Caveat: Rest Argument Support Is Uneven
    2. check-arg—procedure or macro?
  5. Prior Art
    1. Guarding conditionals
    2. Ad hoc checks: check-*
    3. Coercion: ensure-*
    4. SRFI-145 (assume) and R6RS (assert)
    5. Implementation-specific (typing) primitives
    6. Other Lisps: Common Lisp, Clojure
  6. Acknowledgements
  7. Copyright

Abstract

Data validation and type checking (supposedly) make for more correct code. And faster code too, sometimes. And, in rare cases, code that's easier to follow than un-checked code. Unfortunately, Scheme does not have many (type-)checking primitives out of the box. This SRFI provides some, with the aim of allowing more performant and correct code with minimum effort on the user side. Both (manual) argument checking/validation (check-arg) and return value(s) (values-checked) checking/coercion are provided. Syntax sugar like define-checked and define-record-type-checked is added on top.

Rationale

Many Scheme code bases grow to a certain point—a point where entropy creates too diverse of an input set to ignore. That's the moment most programmers turn to contracts (in a loose sense of the word) and data validation. Other languages have type systems, object-orientation, type polymorphism, and other systems establishing code contracts. Scheme doesn't.

Scheme programmers and projects often compile a set of validation utilities. Be it for correctness (reliable data structure libraries, foreign function interfaces.) Or for speed (game engines, numeric libraries.) These utilities come in many shapes and implementations, but they can be split into several groups:

  1. Argument/data validation.
  2. Value coercion.
  3. Function/value/record type declaration.
  4. Type/property/predicate dispatching.

Implementations cater to these needs and cases, providing special forms/macros, usually for type declaration. The syntax and semantics of these differ wildly, though. If one wants to write correct, strongly typed, or performant code, one is stuck with non-portable APIs and implementation-specific hacks. Thus the need for standardization—one needs portable ways to ensure code validity/strictness.

The SRFIs most closely related to this one (and their differences) are:

SRFI-145 (assume)
One of the cited reasons for assume is performance gains. Which is consistent with this SRFI. The difference is the scope—this SRFI provides primitives covering most categories above. While assume is exclusively restricted to generic condition checking (1). And practical use of assume is too wordy, a more inlined syntax (like lambda-checked) never hurts.
SRFI-187 (alambda) and the like
A withdrawn family of SRFIs, the only ones bearing the "Type Checking" category. They fully cover the case of data validation and are good at establishing contracts in general. A few (minor) flaws: they're withdrawn; and the scope of the alambda SRFI is explicitly wider than type checking, mostly focusing on procedure arguments (3) in general.
SRFI-143 (Fixnums) and SRFI-144 (Flonums)
SRFIs defining two types of numbers that are "close to the metal" in being size and functionality restricted. These SRFIs are already used by multiple implementations (Guile and PreScheme, notably) in code optimization. And the notion of types is permeating both SRFIs. The downside of both flonum and fixnum SRFIs is that they don't generalize to more types. Which is a virtue in their particular context, but loss in terms of general (type-)checking. This SRFI fills the gap for more types.

See the Prior art section for more references and context.

This SRFI is novel in that it introduces a set of primitives explicitly geared for predicate/type checking. It is designed so that implementations can turn these primitives into cheap and strong type checks if need be. (The sample implementation already includes multiple implementation-specific optimizations, which proves the point.) No other goal is pursued but allowing more correct, strict, and optimized Scheme code.

Specification

Provided APIs can conceptually be split into two parts: basic and derived. (The distinction won't be reflected in this section structure, but should be trivial to guess.) Derived APIs are merely a syntactic sugar over basic primitives, but they are too useful for everyday programming to be left out. Derived operations also mirror the way some implementations handle typing/contracts at the procedure/record level. This solidifies existing practice, making it convenient to reproduce.

It is recommended that implementations actually throw checking errors in all the "it is an error" cases below. At least in debug mode. The possible type of error might be &assertion. Implementations are free to follow the original permissive "it is an error" behavior whenever deemed necessary, though.

(check-arg predicate arg [caller])

Guarantees that the arg (evaluated) conforms to the predicate (evaluated). Implementations can enforce the predicate check in all the code that follows, but are not required to. It is an error if predicate returns #f when called on arg. Otherwise, return value is unspecified. Implementations may use optional caller (evaluated) argument as the error who/origin if/when signaling a checking error.

One possible implementation, as found in SRFI-1, might be:

(define (check-arg pred val caller)
  (let lp ((val val))
    (if (pred val) val (lp (error "Bad argument" val pred caller)))))

Inspired by Common Lisp check-type special form.

(values-checked (predicates ...) values ...)

Guarantees that the values (evaluated) abide by the given predicates (evaluated) (the number of values and predicates should match) and returns them as multiple values. It is an error if any of the predicates returns false. Implementations may choose to coerce the values when the types are compatible (e.g. integer -> inexact). Supports multiple values:

(values-checked (integer?) 9) ;; Single value
(values-checked (integer? string?) 9 "hello") ;; Multiple values
(values-checked (integer? real? char?) 9 83.12 #\a) ;; More values
(values-checked (integer?) 9.0) ;; 9 or 9.0
(values-checked (string?) 9) ;; => unspecified, is an error

Notice that single value form still has the parentheses around the only predicate. This is to avoid ambiguity. Implementations must not omit parentheses. If presence of parentheses was a marker for multiple values, then more complex predicate would break it:

(values-checked (cut = <> 3) 9)

values-checked is inspired by Common Lisp the special form (one can already see the author comes from Common Lisp world), declaring the unambiguous type for the return value. The same inspiration can be traced in some Scheme implementations, like Chicken (see (Prior Art) Implementations).

(check-case value (predicate body ...) ... [(else else-body ...)])

check-case checks whether the value satisfies one of the predicates. If any of the predicates is satisfied, it evaluates the body corresponding to the first one that is satisfied. If none of the predicates is satisfied:

The return value is that of the satisfied clause body, else-body (when else clause is provided and no other clause matches), or unspecified.

(check-case x
 (integer? (+ 1 x)) ;; Return x + 1 if x is an integer.
 (string? (+ 1 (string->number x)))) ;; Return x' + 1 where x' is read from x.
;; Otherwise (x is neither integer? nor string?), it is an error.

Inspired by Common Lisp's typecase family of macros. And Chicken's compiler-typecase (already used/optimized in the sample implementation.)

(lambda-checked (args ...) body ...)

A regular lambda, but with any argument (except the rest argument) optionally having the form (name predicate) (as compared to default single-symbol form). Arguments of this extended form are guaranteed to satisfy the respective (evaluated) predicate. At least on procedure application time. This guarantee might be extended for all the procedure body, including for any modification and shadowing, at implementors' will. It is an error if either of the arguments does not satisfy the predicate.

;; See (Implementation) Caveat: Rest Argument Support Is Uneven for the caveats of using . rest pattern.
(define error
  (lambda-checked ((who symbol?) (message string?) . irritants)
    ...))

(case-lambda-checked ((args ...) body ...) ...)

Same as case-lambda, but with any argument taking a form of (name predicate) to be checked. See lambda-checked section for other details.

Important note: case-lambda does not dispatch over its argument types (4), it only checks them (3). The only dispatching that happens is argument number dispatching.

(define-checked (name args ...) body ...) | (define-checked name predicate value)

Defines a procedure or variable satisfying the given predicates. For procedures, effectively equal to define+lambda-checked:

(define-checked (error (who symbol?) (message string?) . irritants)
  ...)

For variables, checks the value (and, if implementation supports that, all the subsequent modifications) for the predicate match.

(define-checked message string? "Hi!")

(define-record-type-checked type-name (constructor arg-name ...) predicate field ...)

Defines a record type with checked constructor and field accessors/modifiers. type-name, constructor, and predicate are the same as R7RS define-record-type's (note especially the constructor—checks are not allowed in it, only arg-name symbols!) Fields are either of the form (field-name predicate accessor-name) or (field-name predicate accessor-name modifier-name). These ensure that accessor and modifier return checked data and check new data respectively. It is an error if any of the checks are not successful.

define-record-type-checked also guarantees that initial values passed to constructor are satisfying the predicates provided in field specification.

define-record-type-checked is modeled after PreScheme's define-record-type.

Implementation

Sample implementation is done in standard syntax-rules, which makes it fairly portable and minimalist. Chicken, Kawa, STklos, and Gauche-specific definitions with stronger typing/checking are also provided as cond-expands.

Source for the portable sample implementation. See the SRFI repository for other useful files.

Tested at least on

Caveat: Rest Argument Support Is Uneven

Significant limitation: sample implementation does not support rest arguments in lambda-checked and other lambda macros on some implementations. Notably Chicken and Gauche, because these don't allow rest pattern in macros. Sample implementation should seamlessly support rest arguments on implementations allowing them.

check-arg—procedure or macro?

Most SRFIs that use check-arg (1, 13, 44, 152) define it as a procedure. Sample implementation defines it as a macro, mostly to reuse more optimizable assert/assume underneath. Implementations are free to make it a procedure, and it should work like one.

Prior Art

This section is not normative. It is mostly a result of review of all the accessible SRFIs. (At the moment of writing, 0-252.) And, occasionally, implementation manuals and (mostly Guile-based) real-world programs. There are emergent patterns worth optimizing, and that's what this SRFI does.

Guarding conditionals

One's code only accepts certain types of things. The simplest way to ensure that the arguments are the right type of things is to check, which is what cond is for:

(cond
 ((integer? x) (+ 1 x))
 ((string? x) (+ 1 (string->number x)))
 (else ???))

This pattern has two problems:

Ad hoc checks: check-*

One obvious action on data mismatch is to raise an error. That's what the conventional check-* procedures tend to do. Check that conditions are satisfied and error out if they don't (1). Something like what SRFI-144 does:

(define (check-flonum! name x)
  (if (not (flonum? x))
      (error (string-append "non-flonum argument passed to "
                            (symbol->string name))
             x)))

The benefit of this pattern is that one can put a single check-* form at the start of the procedure and be done with it (SRFI-144 again):

(define (flop1 name op)
  (lambda (x)
    (check-flonum! name x)
    (let ((result (op x)))
      (if (not (flonum? result))
          (error (string-append "non-flonum result from "
                                (symbol->string name))
                 result))
      result)))

The most prevalent example of this strategy is the check-arg, giving the name to one of the primitives in this SRFI. It's present in SRFI-1, SRFI-13, SRFI-44, SRFI-152, and likely some others. The canonical implementation (as found in SRFI-1) is

(define (check-arg pred val caller)
  (let lp ((val val))
    (if (pred val) val (lp (error "Bad argument" val pred caller)))))

Coercion: ensure-*

Another strategy of dealing with malformed data is trying to coerce/fix it (2). That's what conventional ensure-* procedure do:

;; SRFI-175
(define (ensure-int x) (if (char? x) (char->integer x) x))

The benefit of the approach is that one is always certain the data is well-formed (to a certain extent.) Some data is too broken to be coerced, which raises a need for checking and erroring out.

SRFI-145 (assume) and R6RS (assert)

Another way to check that things are in order (1) is using the R6RS assert or SRFI-145 assume. These are primitives allowing to check whether the given expression is true and error out if it isn't. Same as check-* procedures, but more generic.

assume, in particular, is used in at least 9 SRFIs, mostly to do simple predicate checks. This use pattern highlights the need for predicate-checking primitive. A new primitive that might be tuned for performance, because performance is one of the motivations behind assume SRFI.

One problem with assume is that it gets too verbose at times. To the point of being longer than the actual procedure body. Take this example from SRFI-214:

(define (flexvector-ref fv index)
  (assume (flexvector? fv))
  (assume (integer? index))
  (assume (< -1 index (flexvector-length fv)))
  (vector-ref (vec fv) index))

One needs a shorter (likely inlined in the procedure definition) way to provide types/predicates. Otherwise it's too much visual clutter.

Implementation-specific (typing) primitives

Most big implementations caring for performance or correctness provide typing/checking APIs. The problem of assume verbosity from the previous section is solved at least by Bigloo, Chicken, Gambit/Gerbil, Kawa, and Typed Racket. Either through inline types in procedure definition

;; Kawa/Bigloo
(define (a x::integer)
  x)
;; Gambit
(define-procedure (a (x integer))
  x)
;; Typed Racket
(define a (lambda ([x : Integer]) x))

or through the separate declaration forms

;; Chicken
(: a (integer -> integer))
;; Typed Racket
(: a (-> Integer Integer))

With these ways to check procedures (3) for types (and sometimes predicates, with custom types some implementations provide) one gets 90% there. Procedures are almost all there is to Scheme, after all.

But some implementations aim for more, rivaling systems like Common Lisp or even statically typed languages. Chicken provides the for return type checking (2), and Kawa provides as, which additionally coerces the values whenever possible. Chicken also has the assume form (not to be confused with SRFI-145), which pins the variables to types for the duration of the body (1). Chicken even has compiler-typecase to cover type dispatch (4). Chicken has a whole assortment of checking tools.

Other Lisps: Common Lisp, Clojure

Most of these things won't be a surprise after talking about how advanced Scheme implementations get. Still, doesn't hurt highlighting things that likely were an inspiration (from Common Lisp) and that enhance the state of the art (in Clojure) of type/predicate checks.

The main departure point for this library is Common Lisp. Having assert and check-type for (1); the and coerce for (2); declare for (3); and (e|c|)typecase (not even mentioning the object system and type-dispatched methods) for (4), it's covering most of what one needs with types/predicates. Thanks to gradual strong typing system integrated into the language.

Clojure, while being restricted by JVM and functional design, has type annotations, conveniently allowing one to attach types to values (2). It also has pre/post conditions for functions, allowing to check the argument validity (3). And, finally, there's clojure.spec that allows to define arbitrarily complex checks for data and match things to the contracts (1).

Acknowledgements

Thanks to Pjotr Prins for the project where the necessity for this SRFI emerged. And for financing parts of this SRFI work.

Thanks to Arthur A. Gleckler for shepherding the whole process. And for accommodating and being patient about some of my chaotic changes.

Thanks goes to all the people on SRFI mailing list. In chronological order:

Marc Nieper-Wißkirchen
For mentioning &assert and suggesting stronger error behavior in this SRFI constructs
Antero Mejr
For multiple comments leading to significant clarifications of the text and for define-record-type-checked suggestion.
Wolfgang Corcoran-Mathe
For reiterations on stronger erroring behavior and input on check-case.
Juliana Sims
For:
Daphne Preston-Kendal
For convincing arguments in favor of check-case and other input.
Retropikzel, Amirouche, Lassi Kortela, and Yuval Langer
For multiple comments, especially regarding practical matters and implementation.

© 2024 Artyom Bologov.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler