Library Files
Derick Eddington
This SRFI is currently in ``draft'' status. To see an explanation of
each status that a SRFI can hold, see here.
To provide input on this SRFI, please
mail to
<srfi minus 103 at srfi dot schemers dot org>
. See
instructions here to
subscribe to the list. You can access previous messages via
the archive of the mailing list.
This SRFI defines a standard for naming and finding files containing libraries. It is intended for implementations of the R6RS, and perhaps other Scheme dialects, so that libraries stored in contemporary file systems may be made available for importing.
The R6RS does not specify how libraries are to be made available for importing. In order for libraries to be portably organized, distributed, installed, and available for importing, using contemporary file systems, a standard is needed for naming and finding the files containing libraries.
R6RS library names and library references as defined by R6RS 7.1 are compound and consist of a list of one or more symbols and an optional last element of either a version or a version reference. The compound nature of the names allows for hierarchical grouping of libraries under shared name prefixes, which is useful for avoiding name conflicts with others' libraries and for organizing related libraries. A list of symbols closely matches a file system path because a path is a sequence of strings naming an entity in a hierarchy of directories. Most (if not all) implementations of R6RS had already exploited this obvious match and provided methods of finding library files based on libraries' names by using each symbol component as a path component. However, implementations' rules for naming and finding library files were not the same, and this lack of coherency began to frustrate attempts at distributing and using library files in a portable way.
Prior to this SRFI, most implementations expected library files to each contain only one library. However, at least one implementation, Larceny, supported files containing multiple libraries. Single-library files have advantages over multiple-library files: (1) the process for finding libraries is not as complex; (2) when people want to find the files containing specific libraries, they do not have to hunt them down by identifying the many possibilities and checking each; (3) it is possible to map library file paths to library names and so know what library a file contains without having to look in it, which is useful for managing library files; (4) editors', version control systems', and other tools' file organization abilities can more naturally be used to organize work with particular libraries. For these reasons, this SRFI has single-library files only.
Prior to this SRFI, implementations supported search paths which are configurable path prefixes under which library files are looked for. In addition to allowing library files' locations to be configurable, search paths allow the possibility of configuring multiple implementations to use the same search path(s) and thus use the same directories and files containing libraries. This is desirable both for avoiding duplicating directories and files and for being able to add or remove library files by simply putting them into or deleting them from the search path(s) which all the implementations are already configured to use and so no further configuration or clean-up is required. For these reasons, this SRFI has search paths and initializes them from an environment variable which all implementations use.
Symbols may contain characters which may not be allowed in paths of some file systems or which are interpreted specially in paths. Prior to this SRFI, implementations encoded some characters in library file paths. However, their encoding schemes were not the same. This SRFI has an encoding scheme which requires encoding only four characters and is intended to allow using all other characters a file system supports, instead of being forced to encode some characters unnecessarily, while still being able to exchange library files with others who encode a different set of characters.
To support selecting implementation-specific libraries from among other libraries with the same names but for other implementations, prior to this SRFI, most (if not all) implementations had an additional rule for finding library files. This rule is that the file name extension has an additional component prepended to it, and this component is the name of the implementation, and files with the implementation-specific extension and files with only the basic unspecific extension are chosen, and files with an extension specific to other implementations are ignored. This allows for structuring libraries such that portable interfaces to implementation-specific features can be made. Also, this allows for structuring library files such that implementation-specific libraries can be used in preference over generic libraries. By using the file name extension to accomplish this, files containing libraries named the same can be kept in the same directory and kept organized with other related libraries. By providing implementation-specific functionality via the same library name and interface, importing source code remains portable and does not need to be changed to support new implementations because all that is required is to add a file for a new implementation. However, implementations which did not encode the extension separator character suffered from allowing conflicts with some possible library names. For these reasons, this SRFI has implementation-specific file name extentions and encodes the extension separator character.
R6RS library names may contain a version and library references may contain a version reference. This is useful for referencing libraries with a constraint on what versions are acceptable, which is useful when a library changes and users want to ensure only acceptable versions are used. Some users do not care for versions in library names or references. Prior to this SRFI, implementations' rules for naming and finding files containing versioned libraries were not the same. For these reasons, this SRFI has a scheme for versions in library file names and supports both using versions and not using them.
To support organizing all the files of libraries with a shared name prefix
under the same directory, prior to this SRFI, some implementations had an
additional rule for finding library files. This rule is that an additional path
component is implicitly appended to library file paths so that the last symbol
of a library name corresponds to the last directory containing a library's file.
E.g., a library named (acme)
can have its file under the same
"/search/path/acme"
directory as the file for a library named
(acme foo bar)
, or a library named (acme foo)
can have
its file in the same "/search/path/acme/foo"
directory as the file
for a library named (acme foo helper)
. This is useful for
organizing library files, especially for distributing library collections (or
sub-collections) as a single tree of directories and files all under one top
directory named after the libraries' shared name prefix (which is often also the
name of the collection). Also, this allows such library collections to be
simply put in a search path directory and be immediately available. However,
implementations' rules for implicit file names were not the same. For these
reasons, this SRFI has implicit file names.
Taking into account all possible combinations of the above features had not been coordinated between implementations and so they likely were not the same in how they handled various combinations. This might have caused further portability problems. This SRFI offers the above features in an integrated and standard way.
This SRFI requires library file paths to exactly and unambiguously represent the library name of the library a file contains. This supports: (1) mapping library file paths to library names, including library file paths which include a search path; (2) finding library files by only their paths, i.e., without needing to read the files; (3) multiple versions of a library having their files in the same directory; (4) extracting the search path from library file paths which include a search path. Points 1, 2, and 3 require including the version in the file name of libraries with a version in their name. Points 1 and 4 require search paths to be independent of each other.
A version reference may allow more than one version. A collection of import forms may contain multiple references to the same library but with different version constraints. The collection of library files available in a particular host may contain multiple versions of a library. These facts together cause a challenging problem: choosing a set of libraries which satisfy all the combined version constraints of all the import forms. All the possibilities of version constraints combinations and transitive imports, and differences in the nature of different implementations, can make choosing a set of libraries quite complicated and make requiring a particular method undesirable. For this reason, this SRFI specifies an ordering of library files matching a library reference. This ordering is intended to be used as the precedence for choosing a match. This is intentionally left vague because implementations need the freedom to handle this issue in their own way. This SRFI's ordering is intended to be referred to by particular implementations in how they handle this issue. This ordering may also be useful for managing library files.
Library files are files which contain one library form as the first syntactic datum, and they are files whose path exactly represents the name of the contained library. Any additional contents after the first datum are ignored by this SRFI. Determining files' contents' character encoding is not handled by this SRFI.
A library file path consists of, in order: possibly a search path
prefix, a sequence of path components corresponding to the sequence of symbols
in the contained library's name, possibly an implicitly added last path
component with prefix "^main^"
, possibly a version in the last path
component if the library name has a version, possibly an implementation name in
the last path component, and the file name extension "sls"
. Path
components are separated by a platform-dependent character.
The #\.
character is used in the last path component to separate
the prefix, the version, the version parts, the implementation name, and
the "sls"
extension.
A relative library file path is a library file path without a search path prefix. An absolute library file path is a library file path with a search path prefix.
Search paths are paths grouped in a sequence. They must name directories.
When finding library files, they are prefixed to relative library file paths to
form absolute library file paths which are possible locations of library files.
Search paths must be independent, i.e., one cannot be a prefix of another, so
that the paths of library files unambiguously represent library names. If the
search paths were not independent, it could be ambiguous what library name a
path represents. Search paths may share a prefix as long as their tails differ.
E.g., paths "/foo/bar"
and "/foo/bar/zab"
are not
independent and so the library name represented by path
"/foo/bar/zab/asdf.sls"
is ambiguous because it might be
(zab asdf)
or (asdf)
, but paths
"/foo/bar/blah"
and "/foo/bar/zab"
do not cause
ambiguity and so are allowed.
The operating system environment variable
SCHEME_LIBRARY_SEARCH_PATHS
, if it is defined, is used to
initialize the search paths. Its value is a string containing a sequence of
paths separated by a platform-dependent character. The order of the sequence is
preserved in the sequence of search paths. Implementations may initialize the
search paths to include additional paths before or after those from the
environment variable, such as paths given on the command line which are before,
or system default paths which are after.
The sequential ordering of search paths provides search path precedence which allows overlaying or overriding library files. Multiple searched directories can contain different files for the same libraries, and the files in directories with greater precedence have greater precedence. This can be exploited to extend and/or override the set of available library files, without modifying existing file system entities, by placing the new library files in similarly structured directory trees located under new search paths with greater precedence.
Encoding characters in components of relative library file paths is necessary
to avoid conflicts with the characters this SRFI uses specially, and it is also
done to support the use of characters a file system may not support. The
special characters of this SRFI are: #\%
, the path separator,
#\.
, and #\^
. Technically, only the last path
component needs all four characters to be encoded, and non-last path components
need only #\%
and the path separator to be encoded. However, all
four characters are always encoded, to avoid causing an exceptional case for the
last path component. Additional characters may also be encoded. Communicating
what characters must be encoded for different file systems and coordinating
transcoding of path names is not handled by this SRFI.
Characters are encoded using their UTF-8 encoding such that each UTF-8 byte
is represented as two hexadecimal digits, with alphabetic digits in upper case,
preceded by the #\%
character. This type of encoding is chosen
because it follows URI encoding. E.g., a library named
(a%b c/d e.f g^h)
may be in a file with path
"/search/path/a%25b/c%2Fd/e%2Ef/g%5Eh.sls"
, and a library named
(♥ λ)
may be in a file with path "/search/path/♥/λ.sls"
or "/search/path/%E2%99%A5/%CE%BB.sls"
.
This SRFI uses "sls"
as the file name extension of library
files, because "S.L.S." is the acronym of "Scheme library source". The
extension may have an optional implementation-specific component, and this
component should be the name of the Scheme implementation a library file is for.
For implementation-specific library files, the implementation-specific component
is prepended to ".sls"
to form the complete extension. For library
files not specific to an implementation, only "sls"
is used. E.g.,
for an implementation named "acme"
, a library named
(foo)
may be in a file with path
"/search/path/foo.acme.sls"
or "/search/path/foo.sls"
,
and a library named (foo.acme)
may be in a file with path
"/search/path/foo%2Eacme.acme.sls"
or
"/search/path/foo%2Eacme.sls"
. Encoding the #\.
character avoids conflict because the files for the two libraries can be
named "/search/path/foo.acme.sls"
and
"/search/path/foo%2Eacme.sls"
.
In order to avoid conflicts, the special characters of this SRFI and the
characters #\0
through #\9
are always encoded in an
implementation-specific component of an extension. Any additional characters
configured to be encoded are also encoded. E.g., for an implementation
named "a%b/c.d^e"
, a library named (foo)
may be in a
file with path "/search/path/foo.a%25b%2Fc%2Ed%5Ee.sls"
. This way,
conflict with the special characters is avoided. For an implementation named
"123"
, a library named (foo)
may be in a file with
path "/search/path/foo.%31%32%33.sls"
. This way, conflict with
version components of file names is avoided. For an implementation named
"それ"
, a library named (foo)
may be in a file with
path "/search/path/foo.それ.sls"
or
"/search/path/foo.%E3%81%9D%E3%82%8C.sls"
. This way,
implementation names with characters not supported by a file system can be
used.
If a library's name includes a version then its file name must include the
version. Versions in library file paths are placed in the last path component
after the file name prefix and before the file name extension, and they are
separated from other file name components by the #\.
character and
their sub-parts are also separated by this character. E.g., a library named
(foo (5))
may be in a file with path
"/search/path/foo/^main^.5.sls"
, and a library named
(bar zab (1 2 3))
may be in a file with path
"/search/path/bar/zab.1.2.3.sls"
. Versions in file names are
always distinguishable from implementation-specific components of extensions
because versions in file names must use only the characters #\0
through #\9
and #\.
, and implementation-specific
components of extensions must encode these characters and so conflicts are not
possible.
A library file without a version matches a library reference regardless of its version reference (if everything else about the file's path matches). This does not conform to R6RS, and the rationale for this is it allows supplying a library without a version which fulfills a library reference without regard for its version reference. This is symmetrical with R6RS specifying that a library reference without a version reference will match a library with a version.
An implicit file name is a library file path with a last path component with
prefix "^main^"
. This is considered implicit because it is not
derived from a library name. Implicit file names allow the last directory
containing a library file to be named according to the last symbol of a library
name. There are two possible paths under a search path for a file for a
library: one with the implicit component and one without. The prefix of the
implicit component must avoid conflicts with library names. This is
accomplished by encoding #\^
characters of relative library file
paths. E.g., a library named (foo)
may be in a file with path
"/search/path/foo/^main^.sls"
or
"/search/path/foo.sls"
, and a library named
(foo ^main^)
may be in a file with path
"/search/path/foo/%5Emain%5E/^main^.sls"
or
"/search/path/foo/%5Emain%5E.sls"
. This way, conflict is avoided
because the files for the two libraries can be named
"/search/path/foo/^main^.sls"
and
"/search/path/foo/%5Emain%5E.sls"
.
The prefix "^main^"
is chosen because implicit components are
usually used for libraries which are the main library of a group, and a
protector character which is encoded is used to avoid conflicts as just shown,
and "^main^"
visually stands out, and the #\^
character is not a common character in library name symbols and it does not
require escaping when used in common command shells.
A library reference may match multiple files which each contain an implementation of the library. This SRFI specifies an ordering for all the possibilities of matching files. Scheme implementations should use this ordering as the precedence for choosing a match. This SRFI does not specify which match is chosen, because choosing the files to use for a set of transitive imports involves implementation-dependent complications and choosing the matches which satisfy all the version constraints needs to be left to each implementation.
Multiple files matching a library reference is possible because of: multiple search paths each containing matches, implicit file name matches and non-implicitly-named matches both existing, multiple versions of the library matching the library reference's version reference, and implementation-specific and unspecific matches both existing. This SRFI uses a multi-level ordering to deal with all these aspects. The first level is search paths, the second level is implicit naming of files, the third level is versions, and the fourth level is implementation-specific files.
Matches in a search path which is ordered before another search path are ordered before matches in the other search path.
Within the same search path, implicit file name matches are ordered before non-implicitly-named matches.
Within the same directory, matches are ordered first by version and second by
implementation specificity. A match without a version is ordered before a match
with a version, and a match with a greater version is ordered before a match
with a lesser version, regardless of whether either match has the
implementation-specific file name extension. Matches with a version are ordered
by the usual version ordering. E.g., the version 2
is greater
than 1.2.3
which is greater than 1.2.0
which is
greater than 1.2
. Matches with the same version, or no version,
are relatively ordered by whether or not they are specific to the
implementation; the match which is specific is ordered before the match which is
not. (Ordering files which are specific to different implementations is not
necessary because the only files which can match are those which are not
specific or are specific to the implementation being used.)
spd
s/p/c
spb
/s/p/a
/s/p/a/
foo/
bar.1.0.acme.sls
bar.1.2.other.sls
bar.1.2.sls
bar.1.acme.sls
bar.1.sls
bar.2.acme.sls
bar.2.sls
bar.acme.sls
bar.other.sls
bar.png
bar.sls
zab.sls
bar/
^main^.1.9.acme.sls
^main^.sls
blah.sls
s/p/c/
foo/
bar.1.1.sls
bar.3.sls
bar.other.sls
bar/
^main^.2.sls
^main^.other.sls
spb/
foo/
blah.sls
zab.sls
bar/
^main^.0.7.acme.sls
^main^.0.9.sls
^main^.1.0.sls
^main^.1.2.acme.sls
^main^.other.sls
^main^.png
zab.sls
spd/
foo/
it.sls
bar/
thing.sls
(foo bar (1))
, for an
implementation named "acme"
, is:s/p/c/foo/bar.1.1.sls
spb/foo/bar/^main^.1.2.acme.sls
spb/foo/bar/^main^.1.0.sls
/s/p/a/foo/bar/^main^.sls
/s/p/a/foo/bar/^main^.1.9.acme.sls
/s/p/a/foo/bar.acme.sls
/s/p/a/foo/bar.sls
/s/p/a/foo/bar.1.2.sls
/s/p/a/foo/bar.1.0.acme.sls
/s/p/a/foo/bar.1.acme.sls
/s/p/a/foo/bar.1.sls
SRFI 104: Library Files Utilities is a companion to this SRFI. It is the reference implementation of this SRFI, and it is intended to be useful as a means for Scheme implementations to support this SRFI, and it is a library intended to be useful to users working with library files. It is separate from this SRFI so that this SRFI is abstract and does not require providing a library API and may be provided without providing the companion SRFI. It is intended to be useful on implementations supporting this SRFI, whether or not they use it to implement this SRFI. E.g., it can be used to do transcoding of path names when exchanging library files between different file systems.
(Section which points out things to be resolved. This will not appear in the final SRFI.)
Should R6RS's peculiar version reference syntax and semantics be used, or should a more straightforward design be used instead? This SRFI already differs from R6RS in that a non-empty version reference will match a library with no declared version.
TODO: Anything else?
I thank Aziz Ghuloum and Will Clinger for starting the implementation-specific file name extension idea. I thank PLT Scheme for the ideas of implicit main files and URI-style UTF-8 encoding. I thank Michael Sperber for convincing me to not specify choosing what files to use for a full transitive import set. I thank all those who participated during the draft period of this SRFI and all those who participated in earlier discussions in various other forums. I thank David Van Horn for editing this SRFI and for suggesting it be separated from its companion SRFI.
Copyright (C) Derick Eddington (2009). All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.