by Duncan Guthrie
This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-275@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.
This SRFI proposes a programming interface for working with RFC 3986 universal resource identifiers (URIs), as well as RFC 3987's generalisation to internationalised resource identifiers (IRIs). This document defines record types, normalisation procedures, and conversion between URIs and IRIs. Additionally, we contribute a test suite to specify the behaviour of normalisation with respect to relative references, which has in the past been a source of divergence between implementations.
none so far
RFC 3986 [1] describes an abstract syntax for uniform resource identifiers (URIs) as well as their relative references, which allow documents to be authored without knowing the final publishing location. RFC 3987 [2] defines IRIs, which generalise URIs to Unicode by defining an interpretation of escapes as sequences of UTF-8 octets.
URIs and IRIs are widely used to denote resources across the world wide web, and are the basis of a number of web standards. It is critical to present a programming interface for manipulation of different components in isolation (e.g. paths, hostnames), as working with an URI's coarse string representation directly is error-prone. RFC 3986 defines an abstract syntax and hence conveniently forms the basis of such a programming interface. Further, RFC 3987's generalisation to internationalised identifiers is a natural extension, allowing us to faithfully denote resources in a number of languages, using the universal character set. Indeed, IRIs form the basis of modern standards like Resource Description Format (RDF), a widespread formal model for metadata interchange and knowledge representation. We hence require support for both URIs and IRIs.
RFC 3986 and 3987 distinguish URIs and IRIs, respectively, from their relative references, which must be resolved against a base URI or IRI in order to be used. The main application of relative references is to allow one to refer to resources and author documents without knowing the final publishing location. For example, a graph database may produce RDF documents specified in RDF/XML, where resources are denoted with IRI-relative references, assuming that these documents would be interchanged with another system which mints full IRIs with respect to its hosting location.
Somewhat confusingly, RFC 3986 defines "URI-reference" as the most common usage of URI: either an URI or a relative reference. We follow this common usage by developing polymorphic getters and setters which work on URIs and relative references, with the predicate uri? holding true for both URIs and relative references. This procedure would hence correspond to testing for an RFC 3986 "URI-reference". (RFC 3987 follows an identical convention for IRIs and their relative references, so the same approach is followed for IRIs.)
The most divergent behaviour between widespread implementations of URIs and IRIs has been with respect to normalisation of relative references. We explicitly avoid defining path segment normalisation for relative references (regardless of whether the path is absolute or relative) because the behaviour is largely undefined by RFC 3986 and 3987 or any current RFC. See the section on normalisation for more details and the test suite.
URIs and IRIs are structured data. Of course, record types are convenient as they generate dedicated getters and setters. More importantly, however, we think that record structures are necessary in this case for normalisation procedures to be structure-preserving. Specifically, normalisation procedures may alter the path, such that its segments may be mistaken for other parts of the URI or IRI, such as the <://> portion separating scheme from authority, or for the hostname. Updates to authority components or to the path need to similarly ensure that they produce valid URIs and IRIs when serialised.
We argue that a string representation makes it too easy to inadvertently modify the structure, because normalisation of individual components may yield a string which, when parsed again, is interpreted differently with respect to the structure. A good example of this is the restrictions on paths given an authority, because a non-empty authority is denoted in an URI using two slashes, which are characters which may also appear in paths.
A string representation of an URI or relative reference, or an IRI or relative reference (procedures uri->string and iri->string) is produced by concatenating string representations of the individual fields (scheme, hostname &c.), with the expected separators between components. For efficiency, no assumption should be made that the contents of individual fields can be checked at this point, which typically would involve additional, redundant parsing.
This document defines an interface for both pure-functional and in-place updates to URIs and IRIs, with the latter interface defined in optional sub-libraries. The reasoning behind this is that although in-place setters may be more efficient, existing implementations of URIs and IRIs have not provided in-place setters, if at all. Further, this document is minimally prescriptive of the internal representation, as implementors may select different optimised data structures to represent immutable IRIs to take advantage of information sharing.
We require implementations to provide a pure-functional interface to URIs and IRIs, but not an in-place interface. The reasoning for this is that if implementors choose data structures optimised to purely functional programming, it is more cumbersome to create an impure interface, whereas it is not as cumbersome for implementors to create a (inefficient) pure-functional interface by copying the URI or IRI before updating in-place.
Indeed, while the pure-functional procedures are convenient to programmers, in the sample implementation, they are less efficient because they copy the URI or IRI before performing operations in-place. The sample implementation uses an internal (SRFI 160 [8]) bytevector representation to efficiently store both UTF-8 octets and percent-encoded escapes.
If an implementation provides disjoint mutable and immutable URI and IRI variants, then it is an error to call the in-place setters on an immutable variant.
Our design is to provide getters and setters which abstract away the internal representation, working on string representations of URI components. More generally, we suspect that existing implementations largely omit setters and updaters because preserving internal consistency is fairly cumbersome on the implementor and programmer, with validity of authority components being defined with respect to the path, and vice versa. The specific challenge is to ensure that setting a given component would not violate the URI grammar, as this may elicit a flat string representation which would have a different structure when parsed again.
First, the scheme, query and fragment components do not depend on validity of other components. Setters and updaters invoke the respective parsers on the string representation to be set, and raise an error if the parse failed.
Second, for the path component, validity depends on whether a) any authority component is set; and b) whether we are setting it for an URI, or for a relative reference.
path-abempty production). An error is raised if there is a parse failure.path-absolute production), then the first segment must not be empty (double slash would be parsed as the separator between scheme and authority).The validity of an authority component likewise depends on the shape of the path. See RFC 3986 ABNF for details.
We support three, scheme-independent normalisation procedures:
. and ..) are eliminated.The scheme and host components of both URIs and IRIs are considered case-insensitive, with other components considered case-sensitive. For URIs, the repertoire of characters is within U.S. ASCII., whereas for IRIs, Unicode characters may appear. Nonetheless, for both URIs and IRIs, only U.S. ASCII is case-insensitive, with characters like "É" never being normalised to a lower-case form like "é".
Escapes (percent-encodings) take the form % 0-F 0-F (hexadecimal digits), and these hexadecimal digits have a canonical upper-case form. For example, %cf would be normalised to %CF. In practice, the sample implementation always parses these into a canonical form as it represents these internally as octets, not in the original string form. If implementations do preserve the original form, they must always support normalisation into the canonical upper-case form.
The interpretation of escapes differs for URIs and IRIs.
For URIs, octets may individually be decoded to ASCII characters. Essentially, this occurs if a character is not in the URI reserved range, and if it is permissible within a given URI component (e.g. path). Characters in the reserved range, if encountered in the clear, must never be percent-encoded. Conversely, characters not in the reserved range, and which are not permissible within a given URI component, are encoded as a series of escapes corresponding to a series of UTF-8 octets. This bears particular mention because, while other encodings are valid URIs, the RFC 3986 specification specifically requires this encoding, which enables compatibility with the closely related RFC 3987 specification for IRIs.
IRIs not only generalise the range of characters permissible within the IRI to certain Unicode ranges, but also interpret percent-encodings as UTF-8. Because all URIs are valid IRIs, the normalisation of IRIs with respect to percent-encodings is essentially the same as conversion from an URI to an IRI. During conversion, the entire IRI is interpreted as a UTF-8 code sequence, with any percent-encoding not part of a valid UTF-8 sequence being reencoded.
Path segment normalisation is structurally identical for both URIs and IRIs. It interprets an entire path with respect to two control sequences: . (current working directory) and .. (upper working directory), similar to UNIX paths. Unlike the other two normalisation procedures, path segment normalisation is undefined for relative references, because the path portion is not meaningful except during relative reference resolution.
It should be noted that path segment normalisation is defined for non-relative IRIs with relative paths, such as a URN like <foo:a/b/../.././../../e>. This is a major source of diversion between RFC 3986 implementations. This appears to arise from reusing the remove-dot-segments procedure as defined in RFC 3986 verbatim. In relative reference resolution, this procedure is never called on relative paths, only on absolute paths.
For example, for the (non-relative) IRI <foo:a/b/../.././../../e>, implementations using the RFC 3986 procedure verbatim get <foo:/e>, whereas other implementations get <foo:e>. In the former camp are implementations like the Erlang/OTP system's built-in uri_string [4], and Guile-RDF [6], whereas in the latter camp are implementations like Haskell's network-uri [3]. We are in the latter camp.
A fixed remove-dot-segments might be implemented like as follows. This procedure works on an internal representation of a path, consisting of either a forward-slash character or a segment (a list of characters or percent-encodings). The pattern-matcher employed in this example is that of SRFI 262 [9] and the break procedure of SRFI 1 [7]:
(define (drop~1 cd)
(match cd
['() '()]
[(cons head tail) tail]))
(define (next-segment path)
(match-values (break (lambda (seg) (eq? seg #\/)) path)
[(r (cons #\/ ps1))
(values (append r (list #\/)) ps1)]
[(r _)
(values r '())]))
(define (remove-dot-segments path)
(match path
['() '()]
[(cons #\/ seg*)
(cons #\/ (remove-dot-segments/relative seg*))]
[_
(remove-dot-segments/relative path)]))
(define (remove-dot-segments/relative path)
(let elim-dots ([path path]
[buffer '()])
(match path
['()
(if (null? buffer)
buffer
(flatten (reverse buffer)))]
[(cons* (list #\.) #\/ next-path)
(elim-dots next-path buffer)]
[(list (list #\.))
(elim-dots '() buffer)]
[(cons* (list #\. #\.) #\/ next-path)
(elim-dots next-path (drop~1 buffer))]
[(list (list #\. #\.))
(elim-dots '() (drop~1 buffer))]
[_
(let-values ([(shift-this next-path)
(next-segment path)])
(elim-dots next-path (cons shift-this buffer)))])))
Additionally, the above suggested implementation of remove-dot-segments is considerably clearer than the stack-based description in RFC 3986, with fewer pattern-matching clauses required.
Although we support both URIs and IRIs, for brevity we primarily describe behaviour for IRIs, and omit descriptions of the equivalent procedures for URIs where they behave the same. This works because URIs and IRIs are structurally identical, with the divergence between RFC 3986 and 3987 arising from the generalisation of the character set, and the treatment of normalisation.
Type signatures are specified as arrows from input arguments to a single output. Multiple values are denoted (typ ...) and () denotes unit or void (equivalent to R6RS (cond [#f #f])).
In type specifications, when we refer to iri, we refer to both IRIs and IRI-relative references (i.e. the iri type and the relative-iri type). We specify which specific type of IRI with either non-relative-iri or relative-iri. This is a little loose, but best reflects the use of polymorphism in this library interface.
Finally, throughout this document, we enclose IRIs and URIs in angle brackets, e.g. <http://example.org>.
(srfi 275 iri)iri(non-relative-iri? ident): any → boolean(iri? ident): any → boolean(iri-scheme ident): non-relative-iri → string?(iri-user ident): iri → string?(iri-host ident): iri → string?(iri-port ident): iri → fixnum?(iri-path ident): iri → string?(iri-query ident): iri → string?(iri-fragment ident): iri → string?IRI record type. The record's fields are derived from the RFC 3986 URI grammar. The procedures for components other than scheme are polymorphic on both IRIs and relative references (for which see below).
Examples:
(define example-IRI (string->iri "http://example.org:80/ex#IRI")) example-IRI → <http://example.org:80/ex#IRI> (iri? example-IRI) → #t (non-relative-iri? example-IRI) → #t (iri-scheme example-IRI) → "http" (iri-user example-IRI) → #f (iri-host example-IRI) → "example.org" (iri-port example-IRI) → 80 (iri-path example-IRI) → "/ex" (iri-query example-IRI) → #f (iri-fragment example-IRI) → "IRI"Procedure:
(update-iri-scheme ident str): non-relative-iri → string? → non-relative-iri(update-iri-user ident str): iri → string? → iri(update-iri-host ident str): iri → string? → iri(update-iri-port ident str): iri → fixnum? → iri(update-iri-path ident str): iri → string? → iri(update-iri-query ident str): iri → string? → iri(update-iri-fragment ident str): iri → string? → iriPure-functional setters for IRIs and relative references. Apart from update-iri-port, these procedures take a string, parsing it to the relevant field. It is an error to pass a string which does not conform to the RFC 3987 grammar for that component. The update-iri-scheme procedure is undefined for relative references, and it is an error to call it on a relative reference.
relative-iri(relative-iri? ident): any → boolean(iri? ident): any → booleanRelative IRI record type. Relative references do not have a scheme, and it is an error to call iri-scheme on one. The remaining procedures are polymorphic on both relative references and IRIs.
Examples:
(define example-IRI (string->iri "/ex#IRI")) example-IRI → </ex#IRI> (iri? example-IRI) → #t (non-relative-iri? example-IRI) → #f (relative-iri? example-IRI) → #t (iri-scheme example-IRI) → <raises an ERROR> (iri-user example-IRI) → #f (iri-host example-IRI) → #f (iri-port example-IRI) → #f (iri-path example-IRI) → "/ex" (iri-query example-IRI) → #f (iri-fragment example-IRI) → "IRI"Procedure:
(iri-authority ident): iri | relative-iri → #f | (string? string? fixnum?)(update-iri-authority ident user host port): iri → #f | string? → string? → fixnum? → iriAuthority is derived from the user, host and port. The iri-authority procedure simply retrieves these as multiple values. The update-iri-authority procedure mints a new IRI with these three values set.
(iri-equal? A B): iri → iri → booleanHolds true if the two arguments are IRIs and their fields are equal, or if the two arguments are relative references and their fields are equal. An IRI is never equal to a relative reference, and vice versa. It is an error to call this procedure when either argument is not an IRI or relative reference.
(srfi 275 iri in-place)(set-iri-scheme! ident str): non-relative-iri → string? → ()(set-iri-user! ident str): iri → string? → ()(set-iri-host! ident str): iri → string? → ()(set-iri-port! ident str): iri → fixnum? → ()(set-iri-path! ident str): iri → string? → ()(set-iri-query! ident str): iri → string? → ()(set-iri-fragment! ident str): iri → string? → ()In-place setters for IRIs and relative references. Apart from set-iri-port!, these procedures take a string, parsing it to the relevant field. It is an error to pass a string which does not conform to the RFC 3987 grammar for that component. The set-iri-scheme! procedure is undefined for relative references, and it is an error to call it on a relative reference.
(set-iri-authority! ident user host port): iri → #f | string? → string? → fixnum? → ()Set the user, host and port authority components in-place.
(srfi 275 uri)(srfi 275 uri in-place)Identical programming interfaces to that of IRIs are given for URIs, with procedures and error messages renaming referencing uri instead of iri. The structure of a URI or URI-relative reference is identical. The programming interface differs in the character set permissible within an URI, i.e. string->uri signals an appropriate error, as do the setters like set-uri-host!.
(srfi 275 normalise)(resolve-iri-reference base ref): non-relative-iri → iri → non-relative-iriRelative reference resolution against a base IRI. An error is signalled if the base IRI is a relative reference. Relative reference resolution is, however, defined for both IRIs and relative references, although it is unusual to resolve a non-relative reference. This procedure is pure-functional as it (usually) involves transforming a relative reference into an IRI.
Simple examples derived from the RDF Turtle test cases (see full test cases at end of document):
(define string-cases (list "g:h" "g" "./g" "g/" "/g" "//g"))
(define base01 (string->iri "http://a/bb/ccc/d;p?q"))
(define base02 (string->iri "http://a/bb/ccc/d/"))
(define base07 (string->iri "file:///a/bb/ccc/d;p?q"))
(define (resolve-with base-iri) ;; Higher-order. Returns a function.
(lambda (ref)
(resolve-reference base-iri (string->iri ref))))
(map (resolve-with base01) string-cases)
→ (list <g:h>
<http://a/bb/ccc/g> <http://a/bb/ccc/g> <http://a/bb/ccc/g/>
<http://a/g> <http://g>)
(map (resolve-with base02) string-cases)
→ (list <g:h>
<http://a/bb/ccc/d/g> <http://a/bb/ccc/d/g> <http://a/bb/ccc/d/g/>
<http://a/g> <http://g>)
(map (resolve-with base07) string-cases)
→ (list <g:h>
<file:///a/bb/ccc/g> <file:///a/bb/ccc/g> <file:///a/bb/ccc/g/>
<file:///g> <file://g>)
Procedure: (resolve-uri-reference base ref): non-relative-iri → iri → non-relative-iriRelative reference resolution against a base URI. Behaviour is structurally identical to that of relative reference resolution for IRIs, as is the error behaviour with respect to base URI.
(srfi 275 normalise)(normalise-iri-case ident): iri → iri(normalise-uri-case ident): uri → uriNormalise case-insensitive components (scheme and host), and convert escaped characters (percent-encodings) to canonical upper-case. These procedures are pure-functional, returning the new identifier. These procedures are structurally identical for both IRIs and URIs, with the only difference being in that they signal an error if the argument is not an IRI or URI respectively. These procedures are also well-defined for IRI and URI-relative references respectively.
Procedure:(normalise-iri-escape ident): iri → iriNormalise an IRI or relative reference's escapes (percent-encodings), potentially interpreting escapes if part of valid UTF-8 octet sequences as characters. Conversely, characters which are not permissible within an IRI component will be encoded as a series of escapes corresponding to UTF-8 octets. These procedures are idempotent: a fully normalised IRI or relative reference will be normalised to itself, and iri-equal? will hold true between the two. This procedure is pure-functional and structure-preserving.
(normalise-uri-escape ident): uri → uriNormalise an URI or relative reference's escapes (percent-encodings), potentially interpreting escapes as octets in U.S. ASCII. Conversely, characters which are not permissible within an URI component will be encoded as a series of escapes corresponding to UTF-8 octets. This procedure is pure-functional and structure-preserving.
Procedure:(normalise-iri-path-segments ident): iri → iri(normalise-uri-path-segments ident): uri → uriNormalise path segments depending on the control segments . and ... While path segments of IRI or URI-relative references are not normalised, these procedures simply have no effect and no error is signalled, for parity with the other normalisation procedures. While it is relatively uncommon for relative paths to appear in non-relative IRIs or URIs, they are permissible e.g. within URN components. This procedure is pure-functional and structure-preserving.
(normalise-iri ident): iri → iri(normalise-uri ident): uri → uriThese procedures are essentially a sequence of the three normalisation procedures described previously, albeit the specific order is by escapes, by case and by path segments. These procedures are pure-functional and structure-preserving.
(srfi 275 normalise in-place)(normalise-iri-case! ident): iri → ()(normalise-uri-case! ident): uri → ()In-place variants of normalise-iri-case and normalise-uri-case.
(normalise-iri-escape! ident): iri → ()(normalise-uri-escape! ident): uri → ()In-place variants of normalise-iri-escape and normalise-uri-escape.
(normalise-iri-path-segments! ident): iri → ()(normalise-uri-path-segments! ident): uri → ()In-place variants of normalise-iri-path-segments and normalise-uri-path-segments.
(normalise-iri! ident): iri → ()(normalise-uri! ident): uri → ()In-place variants of normalise-iri and normalise-uri.
(srfi 275 iri)(string->iri str): string → iri(iri->string ident): iri → stringConversion from string to IRI or relative reference, and vice versa. string->iri is the fundamental constructor of IRIs and relative references.
Examples:
(define example-A (string->iri "http://example.org/some/where/place")) (define example-B (string->iri "urn:/some/where/place")) (iri->string example-A) → "http://example.org/some/where/place" (iri->string example-B) → "urn:/some/where/place"
(srfi 275 uri)(string->uri str): string → uri(uri->string ident): uri → stringConversion from string to URI or relative reference, and vice versa. string->uri is the fundamental constructor of URIs and relative references. See the above examples for IRIs.
(srfi 275 normalise)(iri-eqv? A B): iri → iri → booleanTwo IRIs are equivalent if iri-equal? holds, or, post-normalisation with normalise-iri, iri-equal? holds. Similarly, two URIs are equivalent if uri-equal? holds, or, post-normalisation with normalise-uri, uri-equal? holds. It is an error to call iri-eqv? when either argument is not an IRI or relative reference. Similarly, it is an error to call uri-eqv? when either argument is not an URI or relative reference.
(iri->uri ident): iri → uriConvert an IRI to an URI. This proceeds by encoding any character within certain ranges (see RFC 3987 ABNF ucschar and iprivate) to a series of escapes corresponding to those octets in UTF-8. This URI is also a valid IRI (albeit not normalised) as all URIs are valid IRIs. This procedure is structure-preserving: (non-relative) IRIs are never transformed into relative references, or vice-versa. It is an error to call this procedure where the argument is not an IRI or relative reference.
(uri->iri ident): uri → iriConvert an URI to an IRI. This procedure can be viewed as upgrading the URI structure to that of an IRI, then normalising it as an IRI. It is an error to call this procedure where the argument is not an URI or relative reference.
In this section, we describe various test cases and the specific behaviour they evaluate. Because RFC 3986 and 3987 only provide examples for relative reference resolution, it is important to specify the exact behaviour of a correct implementation, especially with respect to normalisation. It is expected that these test cases could be the basis of a more comprehensive property-based test suite, e.g. test the entire range of reserved and unreserved characters in a particular URI component.
normalise-uri-case)<http://example.org/ex#test>
→ <http://example.org/ex#test><HttP://example.org/ex#test>
→ <http://example.org/ex#test><http://MySelf@example.org/Examp#test>
→ <http://MySelf@example.org/Examp#test><http://Example.ORG/ex#test>
→ <http://example.org/ex#test><http://example.org/Examp#test>
→ <http://example.org/Examp#test><http://example.org/examp?Qua#test>
→ <http://example.org/examp?Qua#test><http://example.org/examp#TeSt>
→ <http://example.org/examp#TeSt><http://%aA@%AA%AB%AC%AD%AE/some/where/place>
→ <http://%AA@%AA%AB%AC%AD%AE/some/where/place><http://%aa%Ab%AC%aD%AE/some/where/place>
→ <http://%AA%AB%AC%AD%AE/some/where/place><http://myname@example.org/%Fa/%FB/%fC>
→ <http://myname@example.org/%FA/%FB/%FC><http://myname@example.org/%FA/%FB/%FC?%ff>
→ <http://myname@example.org/%FA/%FB/%FC?%FF><http://myname@example.org/%FA/%FB/%FC#%ff>
→ <http://myname@example.org/%FA/%FB/%FC#%FF>normalise-iri-case)<http://CRÊPES.example.org>
→ <http://crÊpes.example.org>
No equivalent for scheme as that only contains ASCII even in IRIsnormalise-uri-escape)<http://my!name@example.org/ex#test>
→ <http://my!name@example.org/ex#test><http://myname@!example.org/ex#test>
→ <http://myname@!example.org/ex#test><http://myname@example.org/ex!#test>
→ <http://myname@example.org/ex!#test><http://myname@example.org/ex?!a#test>
→ <http://myname@example.org/ex?!a#test><http://myname@example.org/ex?a#!test>
→ <http://myname@example.org/ex?a#!test><http://my%40name@example.org/ex#test>
→ <http://my%40name@example.org/ex#test><http://myname@ex%40ample.org/ex#test>
→ <http://myname@ex%40ample.org/ex#test><http://myname@example.org/e%40x#test>
→ <http://myname@example.org/e%40x#test><http://myname@example.org/ex?a%40#test>
→ <http://myname@example.org/ex?a%40#test><http://myname@example.org/ex?a#t%40est>
→ <http://myname@example.org/ex?a#t%40est><http://my%2Ename@example.org/ex?a#test>
→ <http://my.name@example.org/ex?a#test><http://myname@example%2Eorg/ex?a#test>
→ <http://myname@example.org/ex?a#test><http://myname@example.org/misc%2Etxt#test>
→ <http://myname@example.org/misc.txt#test><http://myname@example.org/misc.txt?%2E%2E%2E>
→ <http://myname@example.org/misc.txt?...><http://myname@example.org/misc.txt#line%31%30>
→ <http://myname@example.org/misc.txt#line10><http://dosh£@crepes.example.org>
→ <http://dosh%C2%A3@crepes.example.org><http://crêpes.example.org>
→ <http://cr%C3%AApes.example.org><http://crepes.example.org/in/Rhône>
→ <http://crepes.example.org/in/Rh%C3%B4ne><http://crepes.example.org/in/Rennes?Dim.‥Sam.>
→ <http://crepes.example.org/in/Rennes?Dim.%E2%80%A5Sam.><http://crepes.example.org/in/Rennes#L'Étage>
→ <http://crepes.example.org/in/Rennes#L'%C3%89tage>normalise-iri-escape)<http://dosh%C2%A3@crepes.example.org>
→ <http://dosh£@crepes.example.org><http://cr%C3%AApes.example.org>
→ <http://crêpes.example.org><http://crepes.example.org/in/Rh%C3%B4ne>
→ <http://crepes.example.org/in/Rhône><http://crepes.example.org/in/Rennes?Dim.%E2%80%A5Sam.>
→ <http://crepes.example.org/in/Rennes?Dim.‥Sam.><http://crepes.example.org/in/Rennes#L'%C3%89tage>
→ <http://crepes.example.org/in/Rennes#L'Étage><https://en.wiktionary.org/wiki/%E1%BF%AC%CF%8C%CE%B4%CE%BF%CF%82>
→ <https://en.wiktionary.org/wiki/Ῥόδος><https://example.org/music/%C3%89irigh'sCuirOrtDoChuid%C3%89adaigh>
→ <https://example.org/music/Éirigh'sCuirOrtDoChuidÉadaigh><https://en.wiktionary.org/wiki/Ῥόδος>
→ <https://en.wiktionary.org/wiki/Ῥόδος><https://en.wiktionary.org/wiki/Ῥόδος>
→ <https://en.wiktionary.org/wiki/Ῥόδος>
In this test case, normalise-iri-escape should be called multiple times, i.e. (compose normalise-iri-escape normalise-iri-escape)normalise-uri-path-segments and normalise-iri-path-segments)<http://example.org/some/where/place>
→ <http://example.org/some/where/place><urn:/some/where/place>
→ <urn:/some/where/place><urn:some/where/place>
→ <urn:some/where/place><urn:/some/./where/././place/./>
→ <urn:/some/where/place/><urn:some/./where/././place/./>
→ <urn:some/where/place/><urn:/some//where//place//>
→ <urn:/some//where//place//><urn:some//where//place//>
→ <urn:some//where//place//></>
→ </><//>
→ <//></a/b/../../c>
→ </a/b/../../c></a/b/././c>
→ </a/b/././c></a/b/../c/././d>
→ </a/b/../c/././d><a/b/../../c>
→ <a/b/../../c><a/b/././c>
→ <a/b/././c><a/b/../c/././d>
→ <a/b/../c/././d><./def>
→ <./def><./abc:def>
→ <./abc:def><../../abc/./def>
→ <../../abc/./def><foo:a/b/../.././../../e>
→ <foo:e>
From Haskell network-uri [3]<http://example.com////../..>
→ <http://example.com//>
From Webkit [5]<http://example.com/foo/bar//../..>
→ <http://example.com/foo/>
From Webkit [5]<http://example.com/foo/bar//..>
→ <http://example.com/foo/bar/>
From Webkit [5]<http://example/a/b/../../c>
→ <http://example/c>
From Haskell network-uri [3]<http://example/a/b/c/../../>
→ <http://example/a/>
From Haskell network-uri [3]<http://example/a/b/c/./>
→ <http://example/a/b/c/>
From Haskell network-uri [3]<http://example/a/b/c/.././>
→ <http://example/a/b/>
From Haskell network-uri [3]<http://example/a/b/c/d/../../../../e>
→ <http://example/e>
From Haskell network-uri [3]<http://example/a/b/c/d/../.././../../e>
→ <http://example/e>
From Haskell network-uri [3]<http://example/a/b/../.././../../e>
→ <http://example/e>
From Haskell network-uri [3]uri->iri)<https://en.wiktionary.org/wiki/%E1%BF%AC%CF%8C%CE%B4%CE%BF%CF%82>
→ <https://en.wiktionary.org/wiki/Ῥόδος><https://example.org/ceol/%C3%89irigh'sCuirOrtDoChuid%C3%89adaigh>
→ <https://example.org/ceol/Éirigh'sCuirOrtDoChuidÉadaigh>iri->uri)<https://en.wiktionary.org/wiki/Ῥόδος>
→ <https://en.wiktionary.org/wiki/%E1%BF%AC%CF%8C%CE%B4%CE%BF%CF%82><https://example.org/ceol/Éirigh'sCuirOrtDoChuidÉadaigh>
→ <https://example.org/ceol/%C3%89irigh'sCuirOrtDoChuid%C3%89adaigh>| scheme | user | host | port | path | query | fragment |
|---|---|---|---|---|---|---|
Empty URI: <> | ||||||
| N/A | #f | #f | #f | #f | #f | #f |
Empty authority: <//> | ||||||
| N/A | #f | "" | #f | #f | #f | #f |
Empty user: <//@> | ||||||
| N/A | "" | #f | #f | #f | #f | #f |
Empty port: <//:> | ||||||
| N/A | #f | "" | #f | #f | #f | #f |
Empty query: <?> | ||||||
| N/A | #f | #f | #f | #f | "" | #f |
Empty fragment: <#> | ||||||
| N/A | #f | #f | #f | #f | #f | "" |
Path which looks like a hostname: <example.org> | ||||||
| N/A | #f | #f | #f | "example.org" | #f | #f |
URN-like: <urn:something> | ||||||
"urn" | #f | #f | #f | "something" | #f | #f |
URN-like, path looks like hostname: <urn:example.org> | ||||||
"urn" | #f | #f | #f | "example.org" | #f | #f |
Path which looks like a URN: <./urn:something> | ||||||
| N/A | #f | #f | #f | "./urn:something" | #f | #f |
User with colon segment: <http://a:b@c:29> | ||||||
"http" | "a:b" | "c" | 29 | #f | #f | #f |
User-like component appears as path: <http::@c:29> | ||||||
"http" | #f | #f | #f | ":@c:29" | #f | #f |
Host-like component appears as user: <http://example.org:b@d/> | ||||||
"http" | "example.org:b" | "d" | #f | "/" | #f | #f |
Padded port as numeric value: <http://example.org:000080> | ||||||
"http" | #f | "example.org" | 80 | #f | #f | #f |
Query component with question mark: <http://example.org/abcd?efgh?ijkl> | ||||||
"http" | #f | "example.org" | #f | "/abcd" | "efgh?ijkl" | #f |
Fragment component with question mark: <http://example.org/abcd#efgh?ijkl> | ||||||
"http" | #f | "example.org" | #f | "/abcd" | #f | "efgh?ijkl" |
Path where first segment looks like host: <http:///some/where/place> | ||||||
"http" | #f | "" | #f | "/some/where/place" | #f | #f |
Scheme with nil host: <foo:> | ||||||
"foo" | #f | #f | #f | #f | #f | #f |
Scheme with path, empty host: <foo:////g> | ||||||
"foo" | #f | "" | #f | "//g" | #f | #f |
Scheme with path, nil host: <foo:.///g> | ||||||
"foo" | #f | #f | #f | ".///g" | #f | #f |
Scheme with non-empty host: <foo://g> | ||||||
"foo" | #f | "g" | #f | #f | #f | #f |
All components filled out: <http://user@example.org:80/some/where/place?qua#ought> | ||||||
"http" | "user" | "example.org" | 80 | "/some/where/place" | "qua" | "ought" |
All components except user filled out: <http://example.org:80/some/where/place?qua#ought> | ||||||
"http" | #f | "example.org" | 80 | "/some/where/place" | "qua" | "ought" |
All components except host filled out: <http://user@:80/some/where/place?qua#ought> | ||||||
"http" | "user" | #f | 80 | "/some/where/place" | "qua" | "ought" |
All components except port filled out: <http://user@example.org/some/where/place?qua#ought> | ||||||
"http" | "user" | "example.org" | #f | "/some/where/place" | "qua" | "ought" |
All components except path filled out: <http://user@example.org:80?qua#ought> | ||||||
"http" | "user" | "example.org" | 80 | #f | "qua" | "ought" |
All components except query filled out: <http://user@example.org:80/some/where/place#ought> | ||||||
"http" | "user" | "example.org" | 80 | "/some/where/place" | #f | "ought" |
All components except fragment filled out: <http://user@example.org:80/some/where/place?qua> | ||||||
"http" | "user" | "example.org" | 80 | "/some/where/place" | "qua" | #f |
Empty host, nil user/port: <http:///some/where/place?qua#ought> | ||||||
"http" | #f | "" | #f | "/some/where/place" | "qua" | "ought" |
Empty user, nil host/port: <http://@/some/where/place?qua#ought> | ||||||
"http" | "" | #f | #f | "/some/where/place" | "qua" | "ought" |
Empty port implies empty host: <http://:/some/where/place?qua#ought> | ||||||
"http" | #f | "" | #f | "/some/where/place" | "qua" | "ought" |
Relative reference, nil host: <////g> | ||||||
| N/A | #f | "" | #f | "//g" | #f | #f |
Relative reference, path, nil host: <.///g> | ||||||
| N/A | #f | #f | #f | ".///g" | #f | #f |
Relative reference, non-empty host: <//g> | ||||||
| N/A | #f | "g" | #f | #f | #f | #f |
Path which looks like a query: <./p=q:r> | ||||||
| N/A | #f | #f | #f | "./p=q:r" | #f | #f |
Relative reference, all components filled out: <//user@example.org:80/some/where/place?qua#ought> | ||||||
| N/A | "user" | "example.org" | 80 | "/some/where/place" | "qua" | "ought" |
Relative reference, all components except user filled out: <//example.org:80/some/where/place?qua#ought> | ||||||
| N/A | #f | "example.org" | 80 | "/some/where/place" | "qua" | "ought" |
Relative reference, all components except host filled out: <//user@:80/some/where/place?qua#ought> | ||||||
| N/A | "user" | #f | 80 | "/some/where/place" | "qua" | "ought" |
Relative reference, all components except port filled out: <//user@example.org/some/where/place?qua#ought> | ||||||
| N/A | "user" | "example.org" | #f | "/some/where/place" | "qua" | "ought" |
Relative reference, all components except path filled out: <//user@example.org:80?qua#ought> | ||||||
| N/A | "user" | "example.org" | 80 | #f | "qua" | "ought" |
Relative reference, all components except query filled out: <//user@example.org:80/some/where/place#ought> | ||||||
| N/A | "user" | "example.org" | 80 | "/some/where/place" | #f | "ought" |
Relative reference, all components except fragment filled out: <//user@example.org:80/some/where/place?qua> | ||||||
| N/A | "user" | "example.org" | 80 | "/some/where/place" | "qua" | #f |
Relative reference empty host, nil user/port: <///some/where/place?qua#ought> | ||||||
| N/A | #f | "" | #f | "/some/where/place" | "qua" | "ought" |
Relative reference empty user, nil host/port: <//@/some/where/place?qua#ought> | ||||||
| N/A | "" | #f | #f | "/some/where/place" | "qua" | "ought" |
Relative reference empty port implies empty host: <//:/some/where/place?qua#ought> | ||||||
| N/A | #f | "" | #f | "/some/where/place" | "qua" | "ought" |
The sample implementation is written in portable R6RS, but uses chibi parse for parsing and imports SRFIs from the Chez-SRFI grab-bag.
network-uri package.© 2026 Duncan Guthrie.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.