[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Overuse of strings

This page is part of the web mail archives of SRFI 83 from before July 7th, 2015. The new archives for SRFI 83 contain all messages, not just those from before July 7th, 2015.



One mistake in the XML world is the overuse of strings to represent
structured data. Scheme has tended to be able to avoid this with its
relatively lightweight notation for lists, trees and symbols.

I feel that this SRFI abuses strings in places: in the language name
"scheme://r6rs", and in the definition of <lib-path>. Instead, these
two should be S-expressions.

URIs are commonly written
  http://www.example.com/some/path?param=value
, requiring a sophisticated parser to recover the structure embedded
in the string. A Scheme system might represent such a URI as
  (http "www.example.com" "some" "path" (param "value"))
or even
  (http (www example com) some path (param value))
requiring no specialised character-level parsing.

As another example, the XPath language has a complicated grammar for
expressions. XPath expressions are often embedded in attributes of XML
documents (I'm thinking in particular of XSLT here). Quite aside from
the ridiculous problems this causes relating to XML Namespaces,
embedding structure in strings causes problems for code-processing
tools, which again have to have heavy-duty parsers to extract the
required information from its string encoding. Sometimes parsing
technology simply isn't available, and baroque staged-evaluation-like
solutions have to be put in place - all for the lack of consistent use
of the facilities for representing structure offered by core XML.

So here's an alternative suggestion for the definitions of <lib-path>
and the language name:

  [...]

  A <lib-path> is therefore represented by a list of path components,
  or a symbol shortcut (defined below). Each path component must be an
  atom. The first path component (the primary component) is the most
  significant. The primary component ``scheme'' should be used to
  refer to libraries that are released by the Scheme community at
  large, in much the same way that a Java package name is used. For
  example, (scheme "acme.com" wiley quicksort) might refer to a
  quicksort library that is distributed by Wiley at Acme. If Wiley's
  quicksort library contains the <lib-path> "utils", it can be
  expanded as (scheme "acme.com" wiley utils).

  The <lib-path> (scheme r6rs) must be supported by every
  implementation, and it must export all bindings in R5RS, plus
  import, export, and indirect-export. Lib-paths starting with (scheme
  rNrs) and (scheme srfi-N) are reserved for definition by standards
  processes.

  The way that <lib-path>s are mapped to library declarations is
  implementation-specific. Implementations might, for example,
  interpret a lib-path with a ``uri'' primary component as a reference
  to library within a UTF-8-encoded package on the web. [...]

Actual references to HTTP-resident library bodies could perhaps be
built from base URIs and path-component offsets, as in

  (uri <base-uri> <component> ...)

such that

  (uri "http://example.com/"; foo bar)

might be interpreted as "http://example.com/foo/bar.scm";. Perhaps such
interpretations of primary components are best left to later SRFIs,
since otherwise we start down the slippery slope of pulling in the
rest of the world's container- and transport-definitions.

Questions and possible variations:

 1. should path components be restricted further to being only
    symbols, or only strings? If we restrict ourselves to symbols
    only, we end up with names like

    (scheme com acme wiley quicksort) ;; more like Java

    which seems fine and forces structure out into the open where it
    belongs; on the other hand, if strings are disallowed as path
    components, lib-paths like (uri "http://example.com/"; library),
    which contain structure that raw symbols cannot represent, are
    disallowed. Perhaps this is a good thing, but probably not.

 2. should paths be big-endian, or little-endian? Compare

    (scheme "acme.com" wiley quicksort)
    (scheme "acme.com" wiley utils)

    with

    (quicksort wiley "acme.com" scheme)
    (utils wiley "acme.com" scheme)

 3. should the "scheme" primary component be reserved only for language
    definitions? It seems redundant to require it for normal libraries.

    (com acme wiley quicksort)
    (com acme wiley utils)

 4. is the mention of R5RS a typo in the "Library References" section?

 5. how might relative lib-paths be accommodated in an S-expression
    alternative to URIs? Perhaps with "(... utils)"?


Regards,
  Tony