Title

Universal Identifiers

Author

Andrew Wilcox
http://andrewwilcox.name/

Status

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. It will remain in draft status until 2006/03/27, or as amended. To provide input on this SRFI, please mailto:srfi-84@srfi.schemers.org. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Abstract

This SRFI proposes a social convention to allow programmers to easily create short, simple Scheme symbols which are guaranteed to be universally unique: No other programmer also following this SRFI will accidentally create a symbol eq? to yours.

Universally unique symbols are useful to identify standards, languages, libraries, types, classes, and other resources.

Issues

Rationale

Universal identifiers feature in a couple recent SRFI proposals. The 2005-10-17 draft of SRFI-76: R6RS Records recommends using UUID's (RFC 4122) to uniquely identify non-generative records, and the 2005-11-30 draft of SRFI-83: R6RS Library Syntax uses a URI (RFC 2396) to identify the R6RS language.

While URI's and UUID's are adequate for providing computers with unambiguous identifiers, people also have needs with respect to universal identifiers that I suggest can be better met with the alternative described here.

Why Not URI's?

The URI syntax is more complex than we need. (Without going to look, is the identifier in SRFI-83 for R6RS "scheme://r6rs/" or is it "scheme://r6rs"?)

URI's also don't give you guidance for constructing your own unique identifier. (You've written a library. Quick! Come up with an URI that's short, easy to type, and you know no one else is using).

Why Not UUID's?

UUID's are great for computers. Your average computer can generate 10 million UUID's in a second and every one is guaranteed to be different from any other UUID.

But they're not so great for people. (Quick! Is urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 the same or different than the example UUID found in the middle of page 4 of RFC 4122?)

And people are important. We copy and paste code. We look at libraries and try to figure out why they are colliding when they shouldn't be.

It's easy to say that one should use UUID's correctly, but we also can look at what people have had trouble with in practice: consider the experience of using the Microsoft Windows installer, which uses UUID's to track the components of installed programs. It's a great hassle when you've updated your component's UUID's when you weren't supposed to, or not updated your UUID's when you should, and now the Window's installer is broken...

The Java, Perl, Python, and Ruby communities all manage a large namespace of code, and they avoid collisions through social conventions instead of using machine generated unique identifiers.

Use Symbols For Identifiers

Lisp languages and Scheme have an elegant property that "two symbols are identical if and only if they are spelled the same way" [R5RS].

This makes symbols ideal for universal identifiers, instead of using strings or some other type.

Make the Names of Important Standards Really Short

As I mentioned above, the SRFI-83 R6RS Library Syntax draft of 2005-10-31 proposes that the identifier for R6RS be:

"scheme://r6rs"

(look, no slash at the end...)

Many millions of people in future generations will be using Scheme, so we want to make this really easy. This SRFI proposes that the universal identifier for R6RS be simply instead:

r6rs

Libraries Organized by Category

Many popular open source library collections are organized by category, sometimes with subcategories within the category, and then libraries within the (sub)category.

Looking forward to the time that we have a large body of reusable libraries that can be used in any R6RS implementation, this SRFI proposes and reserves a naming system for public libraries:

web:server:spiffy
web:server:plt
web:servlet:kawa
...

Such a naming system will require community support, such as through schemers.org, to allow people to register the names of their libraries.

Domain Names

Many projects, groups, and businesses have registered their own domain name. Your domain name is a natural choice to use as the base for your universal identifier... when, of course, the the domain is relevant to the resource you are naming.

call-with-current-continuation.org
plt-scheme.org
...

HTTP URL's

Many people and projects, even if they don't have their own domain name, do have their own home page on the web.

http://swissnet.ai.mit.edu/~jaffer/SLIB.html
http://www.gnu.org/software/kawa/
...

This SRFI supports using HTTP URL's as the base of a universal identifier when that is convenient.

Building Upon the Base

Once you have selected a base unique identifier which is simple, convenient, and easy to read for your resource, more unique identifiers can be easily constructed.

web:server:plt::request-type
plt-scheme.org::drscheme
http://swissnet.ai.mit.edu/~jaffer/SLIB.html::solid

The rules of this SRFI guarantee that none of these different schemes will clash with each other.

Now Use It

This SRFI is designed so that any universal identifier constructed following these rules can be read into any conforming R6RS implementation and used as an identifier without needing to be quoted with the vertical bar (|) syntax.

Specification

A universal identifier is a Scheme symbol that has a base part and an optional extension.

  <universal-identifier> = <base-part>
                         | <base-part> :: <extension>

The "::", if present, is part of the symbol, as if the symbol were created by:

    (string->symbol (string-append base-part "::" extension))

The base part is something that you own and is unique to you. The extension can be whatever you like, as long as the resulting universal identifier can be read into an R6RS implementation without quoting.

Valid Identifiers

To be valid, a universal identifier must be a name that can be used as a Scheme identifier and does not need to be quoted with the vertical bar (|) character:

Note that this disallows some base-part's that could otherwise be constructed, such as some HTTP URL's.

Base-Part Options

You may choose whichever kind of base part that is most convenient for you:

  <base-part> = <open-source-library>
              | <rNrs>
              | <srfi-N>
              | <domain-name>
              | <http-url>

Restrictions and requirements for each kind of base part are described here.

Implementation

No code is needed to use this SRFI.

The following code (R5RS with SRFI-13: String Libraries and SRFI-14: Character-set Library) may be used to check if a universal identifier has a correct form and can be used unquoted in R6RS Scheme implementations.

It does not check whether domain names are legal domain names or if HTTP URL's are valid URL's.


(define needs-quoting
  (char-set-union char-set:whitespace
                  (char-set #\| #\\)))

(define can-begin-a-number
  (char-set-union char-set:digit
                  (char-set #\# #\+ #\. #\-)))

(define (rnrs? s)
  (and (string-prefix? "r"  s)
       (string-suffix? "rs" s)
       (> (string-length s) 3)
       (string-every char-set:digit s 1 (- (string-length s) 2))))

(define (srfi? s)
  (and (string-prefix? "srfi-" s)
       (> (string-length s) 5)
       (string-every char-set:digit s 5 (string-length s))))

(define (universal-identifier-type x)
  
  (let* ((s (cond ((string? x)  x)
                  ((symbol? x)  (symbol->string x))
                  (else         #f)))

         (base (cond ((string-contains s "::")
                      => (lambda (index)
                           (string-take s index)))
                     (else s))))

    (cond ((equal? base "")
           'error:empty-base-part)

          ((char-set-contains? can-begin-a-number (string-ref base 0))
           'error:can-begin-a-number)

          ((string-any needs-quoting base)
           'error:can-not-be-used-unquoted)

          ((string-prefix? "http:" base)
           ;; TODO: add tests for unencoded ::, \, or | chars here?
           'http-url)

          ((string-index base #\.)
           (if (string-index base #\:)
               'error:ambiguous-library-or-domain-name
               'domain-name))
                
          ((string-index base #\:)
           (if (string-index base #\/)
               'error:library-should-not-contain-slash
               'open-source-library))

          ((rnrs? base) 'rNrs)

          ((srfi? base) 'srfi-N)

          (else
           'error:unidentified))))

(define tests
  '((""                                error:empty-base-part)
    ("::foo"                           error:empty-base-part)

    ("1my-library:one"                 error:can-begin-a-number)
    ("-my-library:minus"               error:can-begin-a-number)

    ("games:adventure:colossal-cave"   open-source-library)
    ("games:/adventure"                error:library-should-not-contain-slash)
    ("games:->|"                       error:can-not-be-used-unquoted)

    ("r6rs"                            rNrs)
    ("r600rs"                          rNrs)
    ("rrs"                             error:unidentified)
    ("rx6rs"                           error:unidentified)
    ("r6xrs"                           error:unidentified)

    ("srfi-75"                         srfi-N)
    ("srfi-"                           error:unidentified)
    ("srfi-x75"                        error:unidentified)
    ("srfi-75x"                        error:unidentified)

    ("schemers.org"                    domain-name)
    ("schemers.org:srfi"               error:ambiguous-library-or-domain-name)
    ("schemers.org::srfi"              domain-name)

    ("http://swissnet.ai.mit.edu/~jaffer/SLIB.html"
     http-url)
    ))

(for-each (lambda (test)
            (let ((input           (car test))
                  (expected-result (cadr test)))
              (let ((actual-result (universal-identifier-type input)))
                (cond ((equal? expected-result actual-result)
                       (display "ok ") (display input) (newline))
                      (else
                       (display "FAIL ")        (display input)
                       (display ", expected: ") (write expected-result)
                       (display ", actual: ")   (write actual-result)
                       (newline))))))
          tests)

Copyright

Copyright (C) Andrew M. Wilcox (2005). All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Mike Sperber
Last modified: Thu Jan 26 08:53:34 CET 2006