253: Data (Type-)Checking

by Artyom Bologov

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-253@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2024-08-08
60-day deadline: 2024-10-12
Draft #1 published: 2024-08-13
Draft #2 published: 2024-08-27
Draft #3 published: 2024-09-01

Abstract
Rationale
Specification
Implementation
1. Caveat: Rest Argument Support Is Uneven
Design Decisions
Prior Art
Acknowledgements
Copyright

Abstract

Data validation and type checking (supposedly) make for more correct code. And faster code too, sometimes. And, in rare cases, code that's easier to follow than the un-checked one. Unfortunately, Scheme does not have many (type-)checking primitives out of the box. This SRFI provides some, with the aim of allowing more performant and correct code with minimum effort on the user side. Both (manual) argument checking/validation (check-arg) and return value(s) (values-checked) checking/coercion are provided. Syntax sugar like lambda-checked and define-checked is added on top.

Issues

None at present.

Rationale

Many Scheme code bases grow to a certain point. A point where entropy creates too diverse of an input set to ignore. That's where most programmers turn to contracts (in a loose sense of the word) and data validation. Other languages have type systems, object-orientation, type polymorphism, and other systems establishing code contracts. Scheme doesn't.

Scheme programs often grow a set of validation utilities. Be it for correctness (reliable data structure libraries, foreign function interfaces.) Or for speed (game engines, numeric libraries.) These utils come in many shapes and implementations, but they can be split into several groups:

Argument/data validation.
Value coercion.
Function/value type declaration.
Type/property/predicate dispatching.

Implementations cater to these needs and cases, providing special forms/macros, usually for type declaration. The syntax and semantics of these differ wildly, though. If one wants to write correct, strongly typed, or performant code, one is stuck with non-portable APIs. Thus the need for standardization—one needs portable ways to ensure code validity/strictness.

SRFIs most closely related to this one (and their differences) are:

SRFI-145 (assume): One of the cited reasons for assume is performance gains. Which is consistent with this SRFI. The difference is the scope—this SRFI provides primitives covering most categories above. While assume is exclusively restricted to generic condition checking (1). And practical use of assume is too wordy, a more inlined syntax (like lambda-checked) never hurts.
SRFI-187 (alambda) and the like: A withdrawn family of SRFIs, the only ones bearing the "Type Checking" category. Fully covering the case of data validation and are good at establishing contracts in general. A few (minor) flaws: they're withdrawn; and the scope of the alambda SRFI is explicitly wider than type checking, mostly focusing on procedure arguments (3) in general.

See the Prior art section for more references and context.

This SRFI is novel in that it introduces a set of primitives explicitly geared for predicate/type checking. It is designed so that implementations can turn these primitives into cheap and strong type checks if need be. (The sample implementation already includes multiple implementation-specific optimizations, which proves the point.) No other goal is pursued but allowing more correct, strict, and optimized Scheme code.

Specification

Provided APIs can conceptually be split into two parts: basic and derived. (The distinction won't be reflected in this section structure, but should be trivial to guess.) Derived APIs are merely a syntactic sugar over basic primitives, but they are too useful for everyday programming to be left out. They also mirror the way some implementations handle typing/contracts at the procedure level. This solidifies the existing practice, making it convenient to reproduce.

It is recommended that implementations actually throw checking errors in all the "it is an error" cases below. At least in debug mode. The possible type of error might be &assertion. Implementations are free to follow the original permissive "it is an error" behavior whenever deemed necessary, though.

Notice that (4) is explicitly not covered by this SRFI. See (Design decisions) match&predicates > check-case for why.

(check-arg predicate value . args)

Guarantees that the value (evaluated) conforms to the predicate (evaluated). Implementations can enforce the predicate check in all the code that follows, but are not required to. It is an error if predicate returns #f when called on value. Otherwise, returns non-false value. Implementations may use args in any non-portable way, but are advised to follow the convention of passing the who/caller as third argument.

One possible implementation, as found in SRFI-1, might be:

(define (check-arg pred val caller)
  (let lp ((val val))
    (if (pred val) val (lp (error "Bad argument" val pred caller)))))

The sample implementation defines check-arg as a macro (see (Design decisions) check-arg—procedure or macro?), but there's no reason it can't be a procedure instead.

Inspired by Common Lisp check-type special form.

(values-checked (predicates ...) values ...)

Guarantees that the values (evaluated) abide by the given predicates (evaluated) (the number of values and predicates should match) and returns them as multiple values. It is an error if any of the predicates returned false. Implementations may choose to coerce the values when the types are compatible (e.g. integer -> inexact). Supports multiple values:

(values-checked (integer?) 9) ;; Single value
(values-checked (integer? string?) 9 "hello") ;; Multiple values
(values-checked (integer? real? char?) 9 83.12 #\a) ;; More values
(values-checked (integer?) 9.0) ;; 9 or 9.0
(values-checked (string?) 9) ;; => unspecified, is an error

Notice that single value form still has the parentheses around the only predicate. This is to avoid ambiguity. Implementations must not omit parentheses. If presence of parens was a marker for multiple values, then more complex predicate would break it:

(values-checked (cut = <> 3) 9) ;; Uh oh, we need parentheses after all!

values-checked is inspired by Common Lisp the special form (you can already see the author comes from Common Lisp world,) declaring the unambiguous type for the return value. The same inspiration can be traced in some Scheme implementations, like Chicken (see (Prior Art) Implementations).

(lambda-checked (args ...) body ...)

A regular lambda, but with any argument (except the rest argument) optionally having the form (name predicate) (as compared to default single-symbol form). Arguments of this extended form are guaranteed to satisfy the respective (evaluated) predicate. At least on procedure application time. This guarantee might be extended for all the procedure body, including for any modification and shadowing, at implementors' will. It is an error if either of the arguments does not satisfy the predicate.

;; See (Implementation) Caveat: Rest Argument Support Is Uneven for the caveats of using . rest pattern.
(define error
  (lambda-checked ((who symbol?) (message string?) . irritants)
    ...))

(case-lambda-checked ((args ...) body ...) ...)

Same as case-lambda, but with any argument taking a form of (name predicate) to be checked. See lambda-checked section for other details.

(opt-lambda-checked (args ...) body ...)

Defined in case SRFI-227 is accessible. Same as opt-lambda, but every optional argument with a specified value can also specify a predicate to check it with. (It is an error if the check if unsuccessful.) The respective argument form is (name default-value predicate). Only the optionals with default value can have predicates (to avoid ambiguity.) See lambda-checked section for other details.

An additional define-optionals-checked is provided, equivalent to:

  (define name (opt-lambda-checked (arg ...) body ...))

(define-checked (name args ...) body ...) | (define-checked name predicate value)

Defines a procedure or variable satisfying the given predicates. For procedures, effectively equal to define+lambda-checked:

(define-checked (error (who symbol?) (message string?) . irritants)
  ...)

For variables, checks the value (and, if implementation supports that, all the subsequent modifications) for the predicate match.

(define-checked message string? "Hi!")

Implementation

Sample implementation is done in standard syntax-rules, which makes it fairly portable (and minimalist.) Chicken, Kawa, STklos, and Gauche-specific definitions with stronger typing/checking are also provided as cond-expand-s.

Source for the portable sample implementation. See the SRFI repository for other useful files.

Tested at least on

Guile: full support
Kawa: full support, values-checked coerces values whenever possible
STklos: full support
Chicken: partial support, with type checking only working in compile-time and rest argument support missing.
Gauche: partial support:
- Rest arguments are not supported.
- check-arg shadows the syntax provided by (gauche base)...
- ... and makes additional type checks with Gauche object system.
- No SRFI-227 support yet.
Gambit and Gerbil: broken, undebugged due to insufficient understanding of the implementation

Caveat: Rest Argument Support Is Uneven

Significant limitation: sample implementation does not support rest arguments in lambda-checked and other lambda macros on some implementations. Notably Chicken and Gauche, because these don't allow rest pattern in macros. Sample implementation should seamlessly support rest arguments on implementations allowing them.

Design Decisions

Performance vs. correctness

Checks might be used for performance or correctness. Sometimes these two motives oppose each other. More performant code would only like to know the number is IEEE double, while more correct code would need a number in a [1.0,49.3] range. This SRFI tries to find a middle ground: being minimalist about the check specs (single predicates) so that performance optimizations are possible for a certain known range of types; all the while allowing any predicate or a combination thereof, so that the ones seeking correctness of their code find it in this SRFI.

Minimalist implementation

The implementation is portable syntax-rules macros. This puts limitations on the power of the provided macros. But it's better to have minimalist yet portable SRFI than an implementation-specific kitchen sink. Implementation-specific kitchen sinks will be there anyway, no use in chasing them. What matters is establishing a baseline for (type-)checked procedures and primitives.

check-arg—procedure or macro?

Most SRFIs that use check-arg (1, 13, 44, 152) define it as a procedure. This library defines it as a macro. Mostly to reuse more optimizable assert/assume underneath. Implementations are free to make it a function. That's why there's a . rest argument in check-arg—it's intended for extension. And that's why check-arg reliant primitives pass their names into check-arg. To make implementation's work easier.

Why even include check-arg if there are assert/assume?

check-arg is included for completeness and consistency of the SRFI. And to highlight the existing practice. (Gauche even has its own check-arg that this SRFI catches up to!) It's also expected that check-arg will be easier to optimize for simple predicate case than more generic assert/assume.

match&predicates > check-case

An earlier sample implementation contained check-case primitive, matching the data by predicates. This macro doesn't belong here, though. SRFI-241 and all the practice preceding it already have a way to match predicates. So one only has to use SRFI-241, Wright-Cartwright-Shinn, or implementation-provided match macros to get Common Lisp typecase behavior (4). And some implementations (like Racket) already optimize/type-check this use-case, so no need for specialized construct.

Here's how the type/check-matching incantation might look like (in Wright-Cartwright-Shinn implementation):

(match x
  ((? integer? x) (+ 1 x))
  ((? string? x) (+ 1 (string->number x)))
  (_ (error "Not the right type")))

Removal of let-checked

Previously defined (drafts 1 and 2) let-checked was removed in draft 3. For the same reason this SRFI and SRFI-187 differ—this one is checking-only. let-checked had too much features for questionable benefit. And there was no implementation-specific constructs (except Chicken's assume, but it is explicitly marked as an equivalent of let + the (equivalent to values-checked of this SRFI)) to serve as a back-end for it.

Union/intersection/etc. Types

There's no union/intersection/exclusion types built into the library. These require hard-coding them in every macro, and there's always a question of where to stop adding these new types (do we need "list-of" type? "every" type? "sequence-of"? nested record types?) It's also non-trivial to handle the nested types like (and number? (not integer?)) in syntax-rules (see the (Design decisions) Minimalist implementation part above) Given that Scheme type checks are predicate-based, one can use SRFI-235 combinators (like disjoin or conjoin) or define their own to have custom union type anyway. Member and equality types are simple enough to not need special syntax at all. SRFI-26 could work just fine for this:

(check-arg (cut memv <> '(1 2 3)) argument)
(check-arg (cut eq? <> 'foo) argument)

This approach of not having union etc. checks makes implementation-specific optimizations harder. But implementations like Guile already optimize based on assert and code path analysis. Additional indirection function is optimizable enough. Especially so in the context of the primitives this library provides. Implementations may provide more optimized combinators (say, check-or) on top of this library.

To Err or Not to Err

check-arg and other primitives use SRFI-145 assume instead of standard assert whenever possible. assume is more strict than assert, erroring in more cases. Which highlights an important design decision: it's better to error more. Errors highlight the broken contract, which is useful in development, prompting contract/code refinement. And contract violations that happen at runtime can potentially break a lot. That's why check-arg uses assume. It's better to err than break silently.

Type-checking . rest arg in lambda-checked

One useful reference is Common Lisp: it allows annotating the &rest argument with its element types. Probably what any &rest argument user wants.

(declare (ftype (function (number &rest number) number) +))

Such a feature is omitted from this SRFI (although implementations might add it if they wish to.) Both because it's not trivial to implement and because it asks for a "list-of" primitive (see (Design Decisions) Union/intersection/etc. Types), which is a non-goal complexity level.

Prefix or postfix types?

This SRFI uses postfix types in derived macros like lambda-checked and define-checked. Which is in contrast to languages like C, where types are (mostly) prefix. The choice is subjective, but also consistent with what other Lisps tend to do. Common Lisp defmethod specifiers come after variable name. In general, optional/special data comes after the argument name in procedure definition in most Lisps. And modern C-family languages like Rust and Go embrace postfix types too. More natural to read this way. Name first, auxiliary information last.

Inconsistently enough, check-arg and values-checked put predicates/types first. The heuristic here might be that names go before predicates (as in lambda-checked args,) while predicates go before values (as in values-checked.) Given that some of the provided primitives don't reference names, predicates go first, followed by values.

Prior Art

This section is mostly a result of review of all the accessible SRFIs. (At the moment of writing, 0-252.) And, occasionally, implementation manuals and (mostly Guile-based) real-world programs. There are emergent patterns worth optimizing, and that's what this SRFI does.

Guarding conditionals

The code only accepts certain types of things. The simplest way to ensure that the arguments are the right type of things is to check. Which is what cond is for:

(cond
 ((integer? x) (+ 1 x))
 ((string? x) (+ 1 (string->number x)))
 (else ???))

This pattern has two problems:

One needs to list out all the conditions and potentially duplicate code.
It's not clear what to do when none of the conditions are satisfied.

Ad hoc checks: check-*

One obvious action on data mismatch is to raise an error. That's what the conventional check-* procedures tend to do. Check that conditions are satisfied and error out if they don't (1). Something like what SRFI-144 does:

(define (check-flonum! name x)
  (if (not (flonum? x))
      (error (string-append "non-flonum argument passed to "
                            (symbol->string name))
             x)))

The benefit of this pattern is that one can put a single check-* form at the start of the procedure and be done with it (SRFI-144 again):

(define (flop1 name op)
  (lambda (x)
    (check-flonum! name x)
    (let ((result (op x)))
      (if (not (flonum? result))
          (error (string-append "non-flonum result from "
                                (symbol->string name))
                 result))
      result)))

The most prevalent example of this strategy is the check-arg, giving the name to one of the primitives in this SRFI. It's present in SRFI-1, SRFI-13, SRFI-44, SRFI-152, and likely some others. The canonical implementation (as found in SRFI-1) is

(define (check-arg pred val caller)
  (let lp ((val val))
    (if (pred val) val (lp (error "Bad argument" val pred caller)))))

Coercion: ensure-*

Another strategy of dealing with malformed data is trying to coerce/fix it (2). That's what conventional ensure-* functions do:

;; SRFI-175
(define (ensure-int x) (if (char? x) (char->integer x) x))

The benefit of the approach is that one is always certain the data is well-formed. Well, to a certain extent. Some data is too broken to be coerced, which raises a need for checking and erroring out.

SRFI-145 (assume) and R6RS (assert)

Another way to check that things are in order (1) is using the R⁶RS assert or SRFI-145 assume. Primitives allowing to check whether the given expression is true and error out if it isn't. Same as check-* procedures, but more generic.

assume, in particular, is used in at least 9 SRFIs. Mostly to do simple predicate checks. Which highlights the need for predicate-checking primitive. Maybe even the one tuned for performance, because performance is one of the motivations behind assume SRFI.

One problem with assume is that it gets too verbose at times. To the point of being longer than the actual function body. Take this example from SRFI-214:

(define (flexvector-ref fv index)
  (assume (flexvector? fv))
  (assume (integer? index))
  (assume (< -1 index (flexvector-length fv)))
  (vector-ref (vec fv) index))

So one needs some shorter (likely inlined in the procedure definition) way to provide types/predicates. Otherwise it's too much visual clutter.

Implementation-specific (typing) primitives

Most big implementations caring for performance or correctness provide typing/checking APIs. The problem of assume verbosity from the previous section is solved at least by Bigloo, Chicken, Gambit/Gerbil, Kawa, and Typed Racket. Either through inline types in procedure definition

;; Kawa/Bigloo
(define (a x::integer)
  x)
;; Gambit
(define-procedure (a (x integer))
  x)
;; Typed Racket
(define a (lambda ([x : Integer]) x))

or through the separate declaration forms

;; Chicken
(: a (integer -> integer))
;; Typed Racket
(: a (-> Integer Integer))

With these ways to check procedures (3) for types (and sometimes predicates, with custom types some implementations provide) one gets 90% there. Procedures are almost all there is to Scheme, after all.

But some implementations aim for more, rivaling systems like Common Lisp or even statically typed languages. Chicken provides the for return type checking (2), and Kawa provides as, which additionally coerces the values whenever possible. Chicken also has the assume form (not to be confused with SRFI-145) , which pins the variables to types for the duration of the body (1). Chicken even has compiler-typecase to cover type dispatch (4). Chicken has lots of things, really. Magnificent times we live in—Scheme implementations reaching the level of big ones.

Other Lisps: Common Lisp, Clojure

Most of these things won't be a surprise after talking about how advanced Scheme implementations get. Still, doesn't hurt highlighting things that likely were an inspiration (from Common Lisp) and that enhance the state of the art (in Clojure) of type/predicate checks.

The main departure point for this library is Common Lisp. Having assert and check-type for (1); the and coerce for (2); declare for (3); and (e|c|)typecase (not even mentioning the object system and type-dispatched methods) for (4), it's covering most of what one needs with types/predicates. First-class-ish types, after all 🤷

Clojure, while being restricted by JVM and functional design, has type annotations, conveniently allowing one to attach types to values (2). It also has pre/post conditions for functions, allowing to check the argument validity (3). And, finally, there's clojure.spec that allows to define arbitrarily complex checks for data and match things to the contracts (1). Lots of fun things.

Acknowledgements

??? credit where it is due. For example, please consider acknowledging people on the SRFI mailing list who have contributed to the discussion.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Arthur A. Gleckler