by Artyom Bologov
This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-253@nospamsrfi.schemers.org
. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
Data validation and type checking (supposedly) make for more correct code.
And faster code too, sometimes.
And, in rare cases, code that's easier to follow than the un-checked one.
Unfortunately, Scheme does not have many (type-)checking primitives out of the box.
This SRFI provides some, with the aim of allowing more performant and correct code with minimum effort on the user side.
Both (manual) argument checking/validation (check-arg
)
and return value(s) (values-checked
) checking/coercion are provided.
Syntax sugar like lambda-checked
and define-checked
is added on top.
None at present.
Many Scheme code bases grow to a certain point. A point where entropy creates too diverse of an input set to ignore. That's where most programmers turn to contracts (in a loose sense of the word) and data validation. Other languages have type systems, object-orientation, type polymorphism, and other systems establishing code contracts. Scheme doesn't.
Scheme programs often grow a set of validation utilities. Be it for correctness (reliable data structure libraries, foreign function interfaces.) Or for speed (game engines, numeric libraries.) These utils come in many shapes and implementations, but they can be split into several groups:
Implementations cater to these needs and cases, providing special forms/macros, usually for type declaration. The syntax and semantics of these differ wildly, though. If one wants to write correct, strongly typed, or performant code, one is stuck with non-portable APIs. Thus the need for standardization—one needs portable ways to ensure code validity/strictness.
SRFIs most closely related to this one (and their differences) are:
assume
is performance gains.
Which is consistent with this SRFI.
The difference is the scope—this SRFI provides primitives covering most categories above.
While assume
is exclusively restricted to generic condition checking (1).
And practical use of assume
is too wordy, a more inlined syntax (like lambda-checked
) never hurts.
See the Prior art section for more references and context.
This SRFI is novel in that it introduces a set of primitives explicitly geared for predicate/type checking. It is designed so that implementations can turn these primitives into cheap and strong type checks if need be. (The sample implementation already includes multiple implementation-specific optimizations, which proves the point.) No other goal is pursued but allowing more correct, strict, and optimized Scheme code.
Provided APIs can conceptually be split into two parts: basic and derived. (The distinction won't be reflected in this section structure, but should be trivial to guess.) Derived APIs are merely a syntactic sugar over basic primitives, but they are too useful for everyday programming to be left out. They also mirror the way some implementations handle typing/contracts at the procedure level. This solidifies the existing practice, making it convenient to reproduce.
It is recommended that implementations actually throw checking errors in all the "it is an error" cases below.
At least in debug mode.
The possible type of error might be &assertion
.
Implementations are free to follow the original permissive "it is an error" behavior whenever deemed necessary, though.
Notice that (4) is explicitly not covered by this SRFI. See (Design decisions) match&predicates > check-case for why.
Guarantees that the value
(evaluated) conforms to the predicate
(evaluated).
Implementations can enforce the predicate check in all the code that follows, but are not required to.
It is an error if predicate
returns #f
when called on value
.
Otherwise, returns non-false value.
Implementations may use args
in any non-portable way, but are advised to follow the convention of passing the who/caller as third argument.
One possible implementation, as found in SRFI-1, might be:
(define (check-arg pred val caller) (let lp ((val val)) (if (pred val) val (lp (error "Bad argument" val pred caller)))))
The sample implementation defines check-arg
as a macro
(see (Design decisions) check-arg—procedure or macro?),
but there's no reason it can't be a procedure instead.
Inspired by Common Lisp check-type
special form.
Guarantees that the values
(evaluated) abide by the given predicates
(evaluated)
(the number of values and predicates should match) and returns them as multiple values.
It is an error if any of the predicates
returned false.
Implementations may choose to coerce the values when the types are compatible (e.g. integer -> inexact).
Supports multiple values:
(values-checked (integer?) 9) ;; Single value (values-checked (integer? string?) 9 "hello") ;; Multiple values (values-checked (integer? real? char?) 9 83.12 #\a) ;; More values (values-checked (integer?) 9.0) ;; 9 or 9.0 (values-checked (string?) 9) ;; => unspecified, is an error
Notice that single value form still has the parentheses around the only predicate. This is to avoid ambiguity. Implementations must not omit parentheses. If presence of parens was a marker for multiple values, then more complex predicate would break it:
(values-checked (cut = <> 3) 9) ;; Uh oh, we need parentheses after all!
values-checked
is inspired by Common Lisp the
special form
(you can already see the author comes from Common Lisp world,)
declaring the unambiguous type for the return value.
The same inspiration can be traced in some Scheme implementations, like Chicken
(see (Prior Art) Implementations).
A regular lambda, but with any argument (except the rest argument) optionally having the form (name predicate)
(as compared to default single-symbol form).
Arguments of this extended form are guaranteed to satisfy the respective (evaluated) predicate
.
At least on procedure application time.
This guarantee might be extended for all the procedure body, including for any modification and shadowing, at implementors' will.
It is an error if either of the arguments does not satisfy the predicate.
;; See (Implementation) Caveat: Rest Argument Support Is Uneven for the caveats of using . rest pattern. (define error (lambda-checked ((who symbol?) (message string?) . irritants) ...))
Same as case-lambda
, but with any argument taking a form of (name predicate)
to be checked.
See lambda-checked section for other details.
Defined in case SRFI-227 is accessible.
Same as opt-lambda
, but every optional argument with a specified value can also specify a predicate to check it with.
(It is an error if the check if unsuccessful.)
The respective argument form is (name default-value predicate)
.
Only the optionals with default value can have predicates (to avoid ambiguity.)
See lambda-checked section for other details.
An additional define-optionals-checked
is provided, equivalent to:
(define name (opt-lambda-checked (arg ...) body ...))
Defines a procedure or variable satisfying the given predicates.
For procedures, effectively equal to define+lambda-checked
:
(define-checked (error (who symbol?) (message string?) . irritants) ...)
For variables, checks the value
(and, if implementation supports that, all the subsequent modifications) for the predicate
match.
(define-checked message string? "Hi!")
Defines a record type with checked constructor and field accessors/modifiers.
type-name
, constructor
, and predicate
are the same as R7RS define-record-type
's.
Fields are either of the form (field-name predicate accessor-name)
or (field-name predicate accessor-name modifier-name)
.
These ensure that accessor and modifier return checked data and check new data respectively.
It is an error if any of the checks are not successful.
define-record-type-checked
also guarantees that initial values passed to constructor are satisfying the predicates provided in field specification.
define-record-type-checked
is modeled after PreScheme's define-record-type
.
Sample implementation is done in standard syntax-rules, which makes it fairly portable
(and minimalist.)
Chicken, Kawa, STklos, and Gauche-specific definitions with stronger typing/checking are also provided as cond-expand
-s.
Source for the portable sample implementation. See the SRFI repository for other useful files.
Tested at least on
values-checked
coerces values whenever possible
check-arg
shadows the syntax provided by (gauche base)
...
Significant limitation: sample implementation does not support rest arguments in lambda-checked
and other lambda macros on some implementations.
Notably Chicken and Gauche, because these don't allow rest pattern in macros.
Sample implementation should seamlessly support rest arguments on implementations allowing them.
Checks might be used for performance or correctness. Sometimes these two motives oppose each other. More performant code would only like to know the number is IEEE double, while more correct code would need a number in a [1.0,49.3] range. This SRFI tries to find a middle ground: being minimalist about the check specs (single predicates) so that performance optimizations are possible for a certain known range of types; all the while allowing any predicate or a combination thereof, so that the ones seeking correctness of their code find it in this SRFI.
The implementation is portable syntax-rules
macros.
This puts limitations on the power of the provided macros.
But it's better to have minimalist yet portable SRFI than an implementation-specific kitchen sink.
Implementation-specific kitchen sinks will be there anyway, no use in chasing them.
What matters is establishing a baseline for (type-)checked procedures and primitives.
Most SRFIs that use check-arg
(1, 13, 44, 152) define it as a procedure.
This library defines it as a macro.
Mostly to reuse more optimizable assert
/assume
underneath.
Implementations are free to make it a function.
That's why there's a . rest
argument in check-arg
—it's intended for extension.
And that's why check-arg
reliant primitives pass their names into check-arg
.
To make implementation's work easier.
check-arg
is included for completeness and consistency of the SRFI.
And to highlight the existing practice.
(Gauche even has its own check-arg
that this SRFI catches up to!)
It's also expected that check-arg will be easier to optimize for simple predicate case than more generic assert
/assume
.
An earlier sample implementation contained check-case
primitive, matching the data by predicates.
This macro doesn't belong here, though.
SRFI-241 and all the practice preceding it already have a way to match predicates.
So one only has to use SRFI-241, Wright-Cartwright-Shinn, or implementation-provided match
macros to get Common Lisp typecase
behavior (4).
And some implementations (like Racket) already optimize/type-check this use-case, so no need for specialized construct.
Here's how the type/check-matching incantation might look like (in Wright-Cartwright-Shinn implementation):
(match x ((? integer? x) (+ 1 x)) ((? string? x) (+ 1 (string->number x))) (_ (error "Not the right type")))
Previously defined (drafts 1 and 2) let-checked
was removed in draft 3.
For the same reason this SRFI and SRFI-187 differ—this one is checking-only.
let-checked
had too much features for questionable benefit.
And there was no implementation-specific constructs
(except Chicken's assume
,
but it is explicitly marked as an equivalent of let
+ the
(equivalent to values-checked
of this SRFI))
to serve as a back-end for it.
There's no union/intersection/exclusion types built into the library.
These require hard-coding them in every macro, and there's always a question of where to stop adding these new types
(do we need "list-of" type? "every" type? "sequence-of"? nested record types?)
It's also non-trivial to handle the nested types like (and number? (not integer?))
in syntax-rules
(see the (Design decisions) Minimalist implementation part above)
Given that Scheme type checks are predicate-based,
one can use SRFI-235 combinators
(like disjoin
or conjoin
) or define their own to have custom union type anyway.
Member and equality types are simple enough to not need special syntax at all.
SRFI-26 could work just fine for this:
(check-arg (cut memv <> '(1 2 3)) argument) (check-arg (cut eq? <> 'foo) argument)
This approach of not having union etc. checks makes implementation-specific optimizations harder.
But implementations like Guile already optimize based on assert
and code path analysis.
Additional indirection function is optimizable enough.
Especially so in the context of the primitives this library provides.
Implementations may provide more optimized combinators (say, check-or
) on top of this library.
check-arg
and other primitives use SRFI-145 assume
instead of standard assert
whenever possible.
assume
is more strict than assert
, erroring in more cases.
Which highlights an important design decision: it's better to error more.
Errors highlight the broken contract, which is useful in development, prompting contract/code refinement.
And contract violations that happen at runtime can potentially break a lot.
That's why check-arg
uses assume
.
It's better to err than break silently.
One useful reference is Common Lisp: it allows annotating the &rest
argument with its element types.
Probably what any &rest
argument user wants.
(declare (ftype (function (number &rest number) number) +))
Such a feature is omitted from this SRFI (although implementations might add it if they wish to.) Both because it's not trivial to implement and because it asks for a "list-of" primitive (see (Design Decisions) Union/intersection/etc. Types), which is a non-goal complexity level.
This SRFI uses postfix types in derived macros like lambda-checked
and define-checked
.
Which is in contrast to languages like C, where types are (mostly) prefix.
The choice is subjective, but also consistent with what other Lisps tend to do.
Common Lisp defmethod
specifiers come after variable name.
In general, optional/special data comes after the argument name in procedure definition in most Lisps.
And modern C-family languages like Rust and Go embrace postfix types too.
More natural to read this way.
Name first, auxiliary information last.
Inconsistently enough, check-arg
and values-checked
put predicates/types first.
The heuristic here might be that names go before predicates (as in lambda-checked
args,)
while predicates go before values (as in values-checked
.)
Given that some of the provided primitives don't reference names, predicates go first, followed by values.
This section is mostly a result of review of all the accessible SRFIs. (At the moment of writing, 0-252.) And, occasionally, implementation manuals and (mostly Guile-based) real-world programs. There are emergent patterns worth optimizing, and that's what this SRFI does.
The code only accepts certain types of things.
The simplest way to ensure that the arguments are the right type of things is to check.
Which is what cond
is for:
(cond ((integer? x) (+ 1 x)) ((string? x) (+ 1 (string->number x))) (else ???))
This pattern has two problems:
One obvious action on data mismatch is to raise an error.
That's what the conventional check-*
procedures tend to do.
Check that conditions are satisfied and error out if they don't (1).
Something like what SRFI-144 does:
(define (check-flonum! name x) (if (not (flonum? x)) (error (string-append "non-flonum argument passed to " (symbol->string name)) x)))
The benefit of this pattern is that one can put a single check-*
form at the start of the procedure and be done with it (SRFI-144 again):
(define (flop1 name op) (lambda (x) (check-flonum! name x) (let ((result (op x))) (if (not (flonum? result)) (error (string-append "non-flonum result from " (symbol->string name)) result)) result)))
The most prevalent example of this strategy is the check-arg
, giving the name to one of the primitives in this SRFI.
It's present in SRFI-1, SRFI-13, SRFI-44, SRFI-152, and likely some others.
The canonical implementation (as found in SRFI-1) is
(define (check-arg pred val caller) (let lp ((val val)) (if (pred val) val (lp (error "Bad argument" val pred caller)))))
Another strategy of dealing with malformed data is trying to coerce/fix it (2).
That's what conventional ensure-*
functions do:
;; SRFI-175 (define (ensure-int x) (if (char? x) (char->integer x) x))
The benefit of the approach is that one is always certain the data is well-formed. Well, to a certain extent. Some data is too broken to be coerced, which raises a need for checking and erroring out.
Another way to check that things are in order (1) is using the R6RS assert
or SRFI-145 assume
.
Primitives allowing to check whether the given expression is true and error out if it isn't.
Same as check-*
procedures, but more generic.
assume
, in particular, is used in at least 9 SRFIs.
Mostly to do simple predicate checks.
Which highlights the need for predicate-checking primitive.
Maybe even the one tuned for performance, because performance is one of the motivations behind assume
SRFI.
One problem with assume
is that it gets too verbose at times.
To the point of being longer than the actual function body.
Take this example from SRFI-214:
(define (flexvector-ref fv index) (assume (flexvector? fv)) (assume (integer? index)) (assume (< -1 index (flexvector-length fv))) (vector-ref (vec fv) index))
So one needs some shorter (likely inlined in the procedure definition) way to provide types/predicates. Otherwise it's too much visual clutter.
Most big implementations caring for performance or correctness provide typing/checking APIs.
The problem of assume
verbosity from the previous section is solved at least by Bigloo, Chicken, Gambit/Gerbil, Kawa, and Typed Racket.
Either through inline types in procedure definition
;; Kawa/Bigloo (define (a x::integer) x) ;; Gambit (define-procedure (a (x integer)) x) ;; Typed Racket (define a (lambda ([x : Integer]) x))
or through the separate declaration forms
;; Chicken (: a (integer -> integer)) ;; Typed Racket (: a (-> Integer Integer))
With these ways to check procedures (3) for types (and sometimes predicates, with custom types some implementations provide) one gets 90% there. Procedures are almost all there is to Scheme, after all.
But some implementations aim for more, rivaling systems like Common Lisp or even statically typed languages.
Chicken provides the
for return type checking (2), and Kawa provides as
, which additionally coerces the values whenever possible.
Chicken also has the assume
form (not to be confused with SRFI-145) , which pins the variables to types for the duration of the body (1).
Chicken even has compiler-typecase
to cover type dispatch (4).
Chicken has lots of things, really.
Magnificent times we live in—Scheme implementations reaching the level of big ones.
Most of these things won't be a surprise after talking about how advanced Scheme implementations get. Still, doesn't hurt highlighting things that likely were an inspiration (from Common Lisp) and that enhance the state of the art (in Clojure) of type/predicate checks.
The main departure point for this library is Common Lisp.
Having assert
and check-type
for (1);
the
and coerce
for (2);
declare
for (3);
and (e|c|)typecase
(not even mentioning the object system and type-dispatched methods) for (4),
it's covering most of what one needs with types/predicates.
First-class-ish types, after all 🤷
Clojure, while being restricted by JVM and functional design, has type annotations, conveniently allowing one to attach types to values (2).
It also has pre/post conditions
for functions, allowing to check the argument validity (3).
And, finally, there's clojure.spec
that allows to define arbitrarily complex checks for data and match things to the contracts (1).
Lots of fun things.
??? credit where it is due. For example, please consider acknowledging people on the SRFI mailing list who have contributed to the discussion.
© 2024 Artyom Bologov.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.