Title

Running Scheme Scripts on Unix

Authors

Martin Gasbichler and Michael Sperber

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-22 @nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Draft: 2001-03-08--2001-06-08
Revised: 2001-03-20
Revised: 2001-04-30
Revised: 2001-06-11
Revised: 2001-06-22
Revised: 2001-06-22
Revised: 2001-08-06
Final: 2002-01-20

Abstract

This SRFI describes basic prerequisites for running Scheme programs as Unix scripts in a uniform way. Specifically, it describes:

the syntax of Unix scripts written in Scheme,
a uniform convention for calling the Scheme script interpreter, and
a method for accessing the Unix command line arguments from within the Scheme script.

Rationale

A user, given a Scheme program, has no standard way of running it, even if it is a single file written in R5RS-conformant Scheme, and if the underlying platform is known to be Unix. Almost every Scheme implementation provides an executable capable of starting up the Scheme system and load a particular file, but few pairs of Scheme implementation exist which share a convention for this.

This lack of de-facto standardization makes it impossible to write even a simple end-user program without also shipping a particular Scheme implementation with it or providing elaborate implementation-specific machinery. This SRFI describes a set of conventions which allow the creation of portable Unix scripts written in Scheme.

Unfortunately, the set of existing conventions among Scheme implementation makes it impossible to formulate these conventions in such a way as to remain compatible with all existing solutions to the problem. The Design Rationale section gives a brief overview.

Specification

Script Syntax

<script> --> <script prelude>? <program>
<script prelude> --> #! <space> <any character that isn't a line break>* <line break>

The <script prelude> line may not be longer than 64 characters. (See here for a rationale.)

Script Interpreter Invocation

Systems supporting this SRFI provide a selection of binary executables called Scheme script interpreters depending on the language dialects they support. They provide any of scheme-rnrs, scheme-ieee-n-y, scheme-srfi-0, and scheme-srfi-7 in the regular path. The invocation syntax for these interpreters is always as follows:

<executable> <file> <argument> ...

It is recommended that the Scheme script interpreter resides somewhere in the standard Unix path. Moreover, the recommended way to invoke the Scheme script interpreter from the script is via a /usr/bin/env trampoline, like this:

#! /usr/bin/env <executable>

Semantics

A Scheme script interpreter loads the file specified by <file>. It ignores the script prelude and interprets the rest of the file according to the language dialect specified by the name of the interpreter.

The Scheme script interpreter may also load a different file after making a reasonable check that loading it is semantically equivalent to loading <file>. For example, the script interpreter may assume that a file with a related name (say, with an additional extension) is a compiled version of <file>. (See also below under "Compilability".)

scheme-rnrs expects code written in RnRS Scheme. scheme-ieee-n-y expects code written in IEEE n-y Scheme. Specifically, scheme-r4rs expects code written in R4RS Scheme, scheme-r5rs expects code written in R5RS Scheme, and scheme-ieee-1178-1990 expects code written in IEEE 1178-1990 Scheme. scheme-srfi-0 expects code written in R5RS Scheme using the extensions specified in SRFI 0. scheme-srfi-7 expects code written in R5RS Scheme using the extensions specified in SRFI 7.

Upon invocation of a script, the Scheme system calls a procedure named main with one argument, a list of strings containing the Unix command-line arguments to the script, i.e. the elements of the argv vector of the Scheme script interpreter process from index 1 on. Thus, the first element of the list is the name of the script.

The main procedure should return an integer which becomes the exit status of the script.

When, during the execution of the script, an error is signalled (in the sense of R5RS, Section 1.3.2) the script returns immediately with the value of the C sysexits.h macro EX_SOFTWARE as its exit status or 70, if sysexits.h is unavailable.

Should main return anything other than an integer which would be a valid exit status, the script also returns EX_SOFTWARE.

In the above error situations, implementations are encouraged to display a meaningful error message on stderr.

If the script interpreter allows the script access to the environment (via a future SRFI yet to be written), the environment seen by the script must be identical to that of the script interpreter upon its invocation.

A Scheme implementation supporting this SRFI does not have to provide all of these script interpreters. However, Scheme implementations are encouraged to provide scheme-ieee-1178-1990 if they implement IEEE 1178-1990 or R5RS, scheme-ieee-n-y if they implement another IEEE standard for Scheme, scheme-rnrs if they implement RnRS for n>=4, scheme-srfi-0 if they implement SRFI 0, and scheme-srfi-7 if they implement SRFI 7.

In the case of scheme-srfi-7 all specifications of filenames (marked by <filename> in the syntax of SRFI 7) are string literals containing Unix-style filenames which are absolute or relative to the directory the script resides in.

Interactive Loading of Scripts

Scheme implementations with an interactive development environment which support SRFI 22 are encouraged to also support loading Scheme scripts into that environment.

Compilability

Programmers who want their scripts to be compilable to native code are encouraged to provide an initial invocation line of the format

#! ... <executable>

<executable> is the name of one of the script interpreters from the above list; it may carry a directory prefix such as in /usr/local/bin/scheme-r5rs.

It is expected that Scheme systems supporting compilation to native executables will use the first such line appearing in a script to determine the language dialect.

Example

Here is a Scheme version of the Unix cat utility:

#! /usr/bin/env scheme-r5rs

(define (main arguments)
  (for-each display-file (cdr arguments))
  0)

(define (display-file filename)
  (call-with-input-file filename
    (lambda (port)
      (let loop ()
	(let ((thing (read-char port)))
	  (if (not (eof-object? thing))
	      (begin
		(write-char thing)
		(loop))))))))

Design Rationale

Most Unix Scheme implementation support writing Unix scripts in one form or another. Unfortunately, the invocation syntax as well as the syntax of the script itself vary from one implementation for another. However, the design decisions for this SRFI were made with some care:

Script Interpreters must be Binaries

Script interpreters adhering to this SRFI must be binary executables. They cannot be shell scripts because that would preclude them from being used directly in the invocation line of the prelude: most Unix variants require the script interpreter in the invocation line to be a binary.

Optional Invocation Line

The invocation line is optional to make it possible to write scripts which are standard Scheme files, and by invoking the Scheme script interpreter explicitly. This could make it possible (provided future script SRFIs for other environments follow the example) to write, say, makefiles which are portable among different environments.

Absolute Script Location vs. Trampoline

This SRFI specifies a name but not an absolute location for the Scheme script interpreter. Since most Unix implementations require the interpreter in a script to be an absolute filename, the only way to portably start the interpreter is by calling a standard Unix program such as /usr/bin/env (a so-called trampoline) as shown in the example.

This is because there is no well-established convention for the location of third-party software on Unix systems. A convention common on one system or in one environment might be unimplementable in another. Moreover, trampolines are generally very cheap on Unix, and it is expected that the cost of Scheme script interpretation and execution will almost always dominate the cost of the trampoline.

Portability

Portability is a relative term in the context of this SRFI: Posix and The Single Unix Specification do not guarantee any method for automatic script execution, even though most Unix implementations support the #! convention. Moreover, neither of the two guarantees the presence of an env executable in /usr/bin. However, a wide range of systems do. The discussion archive contains logs from a number of systems.

A more pertinent portability issue is the length of the script invocation line. Until recently, a number of Unix variants imposed a 32-character limit on that line. However, this limit seems to have been raised to 64 or disappeared with more recent versions of most of these systems. (In fact, all that we tested.) Again, the discussion archive contains more data. Since descriptive names for the Scheme script interpreters tend to exceed the 32-character limit on invocation lines, the limit it 64.

Implicit vs. Explicit Command-Line-Parameter Access

This SRFI specifies that the Scheme script interpreters will communicate the command-line arguments to the script as a list argument to the main procedure. Some Scheme implementations use a special global variable that holds the arguments. It is not clear that one alternative is inherently preferable to another. Neither is it clear whether a vector or a list is the more natural data structure.

However, explicitly specifying an entry point has the advantage that scripts are easier to debug with a REPL-type Scheme implementation - it is easily possible to call main explicitly from the REPL, demonstrably achieving the same effect as loading the script from the interpreter.

Command-Line Parameters as Arguments vs. as a Data Structure

There were several discussions on the mailing list about whether to pass command-line parameters as separate arguments to the main procedure or in a data structure. Both offer slight usability advantages depending on context. However, passing the parameters as separate arguments opens up a conflict between semantic correctness ("What should be the exit code of a script with an arity error in the call to main?") and ease of implementation (see this message and the subsequent discussion).

Positional Command-Line Arguments vs. Explicit Switches

A previous draft of this SRFI required that language dialect, entry procedure, and script filename be specified via Unix-style command-line switches. This requires additional command-line parsing machinery. Moreover, this precludes starting the script interpreter directly instead of going through a trampoline on Unix systems which allow only one command-line argument to the script interpreter.

Windows Compatibility

It seems that Windows script syntax is fundamentally incompatible with Unix script syntax, so it is impossible to write a single file which will run as a script on both Unix and Windows. (See also Marc Feeley's message on the subject.) It is feasible to also specify an alternate Script syntax which will work on Windows. However, this is not in the scope of this SRFI. However, Windows allows associating an executable with a file extension which might make Unix Scheme scripts runnable on a Windows system. (See Eli Barzilay's message on the subject.)

Implementation

An implementation is necessarily very implementation-dependent. Moreover, it should be clear that an implementation of this SRFI is not very difficult in any Scheme implementation callable from a Unix shell. Therefore, this SRFI contains no reference implementation.

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor:Shriram Krishnamurthi