by Artyom Bologov
This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-253@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Data validation and type checking (supposedly) make for more correct code.
And faster code too, sometimes.
And, in rare cases, code that's easier to follow than un-checked code.
Unfortunately, Scheme does not have many (type-)checking primitives out of the box.
This SRFI provides some, with the aim of allowing more performant and correct code with minimum effort on the user side.
Both (manual) argument checking/validation (check-arg
)
and return value(s) (values-checked
) checking/coercion are provided.
Syntax sugar like define-checked
and define-record-type-checked
is added on top.
Many Scheme code bases grow to a certain point—a point where entropy creates too diverse of an input set to ignore. That's the moment most programmers turn to contracts (in a loose sense of the word) and data validation. Other languages have type systems, object-orientation, type polymorphism, and other systems establishing code contracts. Scheme doesn't.
Scheme programmers and projects often compile a set of validation utilities. Be it for correctness (reliable data structure libraries, foreign function interfaces.) Or for speed (game engines, numeric libraries.) These utilities come in many shapes and implementations, but they can be split into several groups:
Implementations cater to these needs and cases, providing special forms/macros, usually for type declaration. The syntax and semantics of these differ wildly, though. If one wants to write correct, strongly typed, or performant code, one is stuck with non-portable APIs and implementation-specific hacks. Thus the need for standardization—one needs portable ways to ensure code validity/strictness.
The SRFIs most closely related to this one (and their differences) are:
assume
is performance gains.
Which is consistent with this SRFI.
The difference is the scope—this SRFI provides primitives covering most categories above.
While assume
is exclusively restricted to generic condition checking (1).
And practical use of assume
is too wordy, a more inlined syntax (like lambda-checked
) never hurts.
See the Prior art section for more references and context.
This SRFI is novel in that it introduces a set of primitives explicitly geared for predicate/type checking. It is designed so that implementations can turn these primitives into cheap and strong type checks if need be. (The sample implementation already includes multiple implementation-specific optimizations, which proves the point.) No other goal is pursued but allowing more correct, strict, and optimized Scheme code.
Provided APIs can conceptually be split into two parts: basic and derived. (The distinction won't be reflected in this section structure, but should be trivial to guess.) Derived APIs are merely a syntactic sugar over basic primitives, but they are too useful for everyday programming to be left out. Derived operations also mirror the way some implementations handle typing/contracts at the procedure/record level. This solidifies existing practice, making it convenient to reproduce.
It is recommended that implementations actually throw checking errors in all the "it is an error" cases below.
At least in debug mode.
The possible type of error might be &assertion
.
Implementations are free to follow the original permissive "it is an error" behavior whenever deemed necessary, though.
Guarantees that the arg
(evaluated) conforms to the predicate
(evaluated).
Implementations can enforce the predicate check in all the code that follows, but are not required to.
It is an error if predicate
returns #f
when called on arg
.
Otherwise, return value is unspecified.
Implementations may use optional caller
(evaluated) argument as the error who/origin if/when signaling a checking error.
One possible implementation, as found in SRFI-1, might be:
(define (check-arg pred val caller) (let lp ((val val)) (if (pred val) val (lp (error "Bad argument" val pred caller)))))
Inspired by Common Lisp check-type
special form.
Guarantees that the values
(evaluated) abide by the given predicates
(evaluated)
(the number of values and predicates should match) and returns them as multiple values.
It is an error if any of the predicates
returns false.
Implementations may choose to coerce the values when the types are compatible (e.g. integer -> inexact).
Supports multiple values:
(values-checked (integer?) 9) ;; Single value (values-checked (integer? string?) 9 "hello") ;; Multiple values (values-checked (integer? real? char?) 9 83.12 #\a) ;; More values (values-checked (integer?) 9.0) ;; 9 or 9.0 (values-checked (string?) 9) ;; => unspecified, is an error
Notice that single value form still has the parentheses around the only predicate. This is to avoid ambiguity. Implementations must not omit parentheses. If presence of parentheses was a marker for multiple values, then more complex predicate would break it:
(values-checked (cut = <> 3) 9)
values-checked
is inspired by Common Lisp the
special form
(one can already see the author comes from Common Lisp world),
declaring the unambiguous type for the return value.
The same inspiration can be traced in some Scheme implementations, like Chicken
(see (Prior Art) Implementations).
check-case
checks whether the value
satisfies one of the predicate
s.
If any of the predicate
s is satisfied, it evaluates the body
corresponding to the first one that is satisfied.
If none of the predicate
s is satisfied:
else
clause, it evaluates the respective else-body
.
The return value is that of the satisfied clause body
, else-body
(when else
clause is provided and no other clause matches), or unspecified.
(check-case x (integer? (+ 1 x)) ;; Return x + 1 if x is an integer. (string? (+ 1 (string->number x)))) ;; Return x' + 1 where x' is read from x. ;; Otherwise (x is neither integer? nor string?), it is an error.
Inspired by Common Lisp's typecase family of macros. And Chicken's compiler-typecase (already used/optimized in the sample implementation.)
A regular lambda, but with any argument (except the rest argument) optionally having the form (name predicate)
(as compared to default single-symbol form).
Arguments of this extended form are guaranteed to satisfy the respective (evaluated) predicate
.
At least on procedure application time.
This guarantee might be extended for all the procedure body, including for any modification and shadowing, at implementors' will.
It is an error if either of the arguments does not satisfy the predicate.
;; See (Implementation) Caveat: Rest Argument Support Is Uneven for the caveats of using . rest pattern. (define error (lambda-checked ((who symbol?) (message string?) . irritants) ...))
Same as case-lambda
, but with any argument taking a form of (name predicate)
to be checked.
See lambda-checked section for other details.
Important note: case-lambda
does not dispatch over its argument types (4),
it only checks them (3).
The only dispatching that happens is argument number dispatching.
Defines a procedure or variable satisfying the given predicates.
For procedures, effectively equal to define+lambda-checked
:
(define-checked (error (who symbol?) (message string?) . irritants) ...)
For variables, checks the value
(and, if implementation supports that, all the subsequent modifications) for the predicate
match.
(define-checked message string? "Hi!")
Defines a record type with checked constructor and field accessors/modifiers.
type-name
, constructor
, and predicate
are the same as R7RS define-record-type
's
(note especially the constructor—checks are not allowed in it, only arg-name
symbols!)
Fields are either of the form (field-name predicate accessor-name)
or (field-name predicate accessor-name modifier-name)
.
These ensure that accessor and modifier return checked data and check new data respectively.
It is an error if any of the checks are not successful.
define-record-type-checked
also guarantees that initial values passed to constructor are satisfying the predicates provided in field specification.
define-record-type-checked
is modeled after PreScheme's define-record-type
.
Sample implementation is done in standard syntax-rules, which makes it fairly portable and minimalist.
Chicken, Kawa, STklos, and Gauche-specific definitions with stronger typing/checking are also provided as cond-expand
s.
Source for the portable sample implementation. See the SRFI repository for other useful files.
Tested at least on
values-checked
coerces values whenever possible
check-arg
is not allowed in top-level context, only in wrapping block/procedure.
Which is not much of a hindrance, given that the main use-case is checking procedure arguments.
This divergence enables block-level optimizations.
check-arg
shadows the syntax provided by (gauche base)
...
Significant limitation: sample implementation does not support rest arguments in lambda-checked
and other lambda macros on some implementations.
Notably Chicken and Gauche, because these don't allow rest pattern in macros.
Sample implementation should seamlessly support rest arguments on implementations allowing them.
Most SRFIs that use check-arg
(1, 13, 44, 152) define it as a procedure.
Sample implementation defines it as a macro,
mostly to reuse more optimizable assert
/assume
underneath.
Implementations are free to make it a procedure, and it should work like one.
This section is not normative. It is mostly a result of review of all the accessible SRFIs. (At the moment of writing, 0-252.) And, occasionally, implementation manuals and (mostly Guile-based) real-world programs. There are emergent patterns worth optimizing, and that's what this SRFI does.
One's code only accepts certain types of things.
The simplest way to ensure that the arguments are the right type of things is to check, which is what cond
is for:
(cond ((integer? x) (+ 1 x)) ((string? x) (+ 1 (string->number x))) (else ???))
This pattern has two problems:
One obvious action on data mismatch is to raise an error.
That's what the conventional check-*
procedures tend to do.
Check that conditions are satisfied and error out if they don't (1).
Something like what SRFI-144 does:
(define (check-flonum! name x) (if (not (flonum? x)) (error (string-append "non-flonum argument passed to " (symbol->string name)) x)))
The benefit of this pattern is that one can put a single check-*
form at the start of the procedure and be done with it (SRFI-144 again):
(define (flop1 name op) (lambda (x) (check-flonum! name x) (let ((result (op x))) (if (not (flonum? result)) (error (string-append "non-flonum result from " (symbol->string name)) result)) result)))
The most prevalent example of this strategy is the check-arg
, giving the name to one of the primitives in this SRFI.
It's present in SRFI-1, SRFI-13, SRFI-44, SRFI-152, and likely some others.
The canonical implementation (as found in SRFI-1) is
(define (check-arg pred val caller) (let lp ((val val)) (if (pred val) val (lp (error "Bad argument" val pred caller)))))
Another strategy of dealing with malformed data is trying to coerce/fix it (2).
That's what conventional ensure-*
procedure do:
;; SRFI-175 (define (ensure-int x) (if (char? x) (char->integer x) x))
The benefit of the approach is that one is always certain the data is well-formed (to a certain extent.) Some data is too broken to be coerced, which raises a need for checking and erroring out.
Another way to check that things are in order (1) is using the R6RS assert
or SRFI-145 assume
.
These are primitives allowing to check whether the given expression is true and error out if it isn't.
Same as check-*
procedures, but more generic.
assume
, in particular, is used in at least 9 SRFIs, mostly to do simple predicate checks.
This use pattern highlights the need for predicate-checking primitive.
A new primitive that might be tuned for performance, because performance is one of the motivations behind assume
SRFI.
One problem with assume
is that it gets too verbose at times.
To the point of being longer than the actual procedure body.
Take this example from SRFI-214:
(define (flexvector-ref fv index) (assume (flexvector? fv)) (assume (integer? index)) (assume (< -1 index (flexvector-length fv))) (vector-ref (vec fv) index))
One needs a shorter (likely inlined in the procedure definition) way to provide types/predicates. Otherwise it's too much visual clutter.
Most big implementations caring for performance or correctness provide typing/checking APIs.
The problem of assume
verbosity from the previous section is solved at least by Bigloo, Chicken, Gambit/Gerbil, Kawa, and Typed Racket.
Either through inline types in procedure definition
;; Kawa/Bigloo (define (a x::integer) x) ;; Gambit (define-procedure (a (x integer)) x) ;; Typed Racket (define a (lambda ([x : Integer]) x))
or through the separate declaration forms
;; Chicken (: a (integer -> integer)) ;; Typed Racket (: a (-> Integer Integer))
With these ways to check procedures (3) for types (and sometimes predicates, with custom types some implementations provide) one gets 90% there. Procedures are almost all there is to Scheme, after all.
But some implementations aim for more, rivaling systems like Common Lisp or even statically typed languages.
Chicken provides the
for return type checking (2), and Kawa provides as
, which additionally coerces the values whenever possible.
Chicken also has the assume
form (not to be confused with SRFI-145), which pins the variables to types for the duration of the body (1).
Chicken even has compiler-typecase
to cover type dispatch (4).
Chicken has a whole assortment of checking tools.
Most of these things won't be a surprise after talking about how advanced Scheme implementations get. Still, doesn't hurt highlighting things that likely were an inspiration (from Common Lisp) and that enhance the state of the art (in Clojure) of type/predicate checks.
The main departure point for this library is Common Lisp.
Having assert
and check-type
for (1);
the
and coerce
for (2);
declare
for (3);
and (e|c|)typecase
(not even mentioning the object system and type-dispatched methods) for (4),
it's covering most of what one needs with types/predicates.
Thanks to gradual strong typing system integrated into the language.
Clojure, while being restricted by JVM and functional design, has type annotations, conveniently allowing one to attach types to values (2).
It also has pre/post conditions
for functions, allowing to check the argument validity (3).
And, finally, there's clojure.spec
that allows to define arbitrarily complex checks for data and match things to the contracts (1).
Thanks to Pjotr Prins for the project where the necessity for this SRFI emerged. And for financing parts of this SRFI work.
Thanks to Arthur A. Gleckler for shepherding the whole process. And for accommodating and being patient about some of my chaotic changes.
Thanks goes to all the people on SRFI mailing list. In chronological order:
&assert
and suggesting stronger error behavior in this SRFI constructsdefine-record-type-checked
suggestion.check-case
.check-case
and other input.© 2024 Artyom Bologov.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.