[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Overuse of strings
One mistake in the XML world is the overuse of strings to represent
structured data. Scheme has tended to be able to avoid this with its
relatively lightweight notation for lists, trees and symbols.
I feel that this SRFI abuses strings in places: in the language name
"scheme://r6rs", and in the definition of <lib-path>. Instead, these
two should be S-expressions.
URIs are commonly written
, requiring a sophisticated parser to recover the structure embedded
in the string. A Scheme system might represent such a URI as
(http "www.example.com" "some" "path" (param "value"))
(http (www example com) some path (param value))
requiring no specialised character-level parsing.
As another example, the XPath language has a complicated grammar for
expressions. XPath expressions are often embedded in attributes of XML
documents (I'm thinking in particular of XSLT here). Quite aside from
the ridiculous problems this causes relating to XML Namespaces,
embedding structure in strings causes problems for code-processing
tools, which again have to have heavy-duty parsers to extract the
required information from its string encoding. Sometimes parsing
technology simply isn't available, and baroque staged-evaluation-like
solutions have to be put in place - all for the lack of consistent use
of the facilities for representing structure offered by core XML.
So here's an alternative suggestion for the definitions of <lib-path>
and the language name:
A <lib-path> is therefore represented by a list of path components,
or a symbol shortcut (defined below). Each path component must be an
atom. The first path component (the primary component) is the most
significant. The primary component ``scheme'' should be used to
refer to libraries that are released by the Scheme community at
large, in much the same way that a Java package name is used. For
example, (scheme "acme.com" wiley quicksort) might refer to a
quicksort library that is distributed by Wiley at Acme. If Wiley's
quicksort library contains the <lib-path> "utils", it can be
expanded as (scheme "acme.com" wiley utils).
The <lib-path> (scheme r6rs) must be supported by every
implementation, and it must export all bindings in R5RS, plus
import, export, and indirect-export. Lib-paths starting with (scheme
rNrs) and (scheme srfi-N) are reserved for definition by standards
The way that <lib-path>s are mapped to library declarations is
implementation-specific. Implementations might, for example,
interpret a lib-path with a ``uri'' primary component as a reference
to library within a UTF-8-encoded package on the web. [...]
Actual references to HTTP-resident library bodies could perhaps be
built from base URIs and path-component offsets, as in
(uri <base-uri> <component> ...)
(uri "http://example.com/" foo bar)
might be interpreted as "http://example.com/foo/bar.scm". Perhaps such
interpretations of primary components are best left to later SRFIs,
since otherwise we start down the slippery slope of pulling in the
rest of the world's container- and transport-definitions.
Questions and possible variations:
1. should path components be restricted further to being only
symbols, or only strings? If we restrict ourselves to symbols
only, we end up with names like
(scheme com acme wiley quicksort) ;; more like Java
which seems fine and forces structure out into the open where it
belongs; on the other hand, if strings are disallowed as path
components, lib-paths like (uri "http://example.com/" library),
which contain structure that raw symbols cannot represent, are
disallowed. Perhaps this is a good thing, but probably not.
2. should paths be big-endian, or little-endian? Compare
(scheme "acme.com" wiley quicksort)
(scheme "acme.com" wiley utils)
(quicksort wiley "acme.com" scheme)
(utils wiley "acme.com" scheme)
3. should the "scheme" primary component be reserved only for language
definitions? It seems redundant to require it for normal libraries.
(com acme wiley quicksort)
(com acme wiley utils)
4. is the mention of R5RS a typo in the "Library References" section?
5. how might relative lib-paths be accommodated in an S-expression
alternative to URIs? Perhaps with "(... utils)"?