[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Overuse of strings

This page is part of the web mail archives of SRFI 83 from before July 7th, 2015. The new archives for SRFI 83 contain all messages, not just those from before July 7th, 2015.

[I apologize - this message is somewhat off topic.]

Lauri Alanko wrote:
On Tue, Jan 24, 2006 at 11:51:34AM -0800, Per Bothner wrote:

What would using symbols and s-exp gain?  What kind of
operations would it make easier?

There are two different issues here: how should paths or URIs be
represented at run-time, and what kind of notation should be used for
giving literal values for them in code. As you are speaking about
"operations", I assume you mean the former here.

To me it is obvious: _all_ common operations on URIs are easier if you
have a structured representation instead of a flat string. Maybe the
most common operation is resolving a relative URI against a base URI. A
purely string-based implementation is a huge mess that involves
searching for slashes from right to left (but remembering that
consequent slashes count as a single one),

Actually, two slashes define the "authority" part.

detecting ".." and "."
-segments and whatnot... it's the sort of thing you expect to see only
in C code.

It's not *that* complicated.  And note that the specification is in
terms of string operations, so making sure that a "structued"
implementation gives the correct results may actually be more

Any sane implementation will first parse the URI into its constituents
and form a list of path segments, and then operate on that list. It
would be just silly to constantly parse and unparse the URIs at every
operation, so it's better to have a distinct internal representation for
them. And indeed, this is why many languages do have special types or
classes for representing URIs.

I don't disagree.  Though "parsing and unparsing for every operation"
is unlikely to be performance critical. More, it may actually be faster
on modern computers, because it is more compact, and locality is great.
(Remember that to a first approximation on modern computers
instructions take no time - it is cache misses that are expensive.)

What about "path names" (as used in file operations): Should they be
structured objects or strings?

Definitely objects. Nowadays PLT Scheme has built-in support for path
objects, but before that I used to use a simple library:
Here relative-path calculates the relative path from "from" to "to".
Would you like to do this kind of stuff using _strings_?

No - I want this to be hidden in my implementation, using appropriate
library procedures.

My actual preference is an abstract opaque "path" type with operations
that can map to and from URI strings.  So whether the internal
representations uses URI strings or lists should be an implementation

I just find it sad that underneath all these high-level conveniences,
the operating system still uses strings for paths in the system call
interface. As a result, '/' is an utterly magical character that cannot
appear in any file's name.

I agree.  Though I'm not sure how one would fix that, given that one
does want a displayable and printable external representation.  The
RFC solution allows you to escape special characters, which means
you've changed reserved '/' for reserved '%'.

There are good reasons to prefer strings (standard, universal, and
familiar, as listed above). At least it makes sense to read and print
pathnames using URI syntax.

Certainly it should be possible, but hardly the default.

Ignoring path-name literals (which I think are less frequent), you
still have to get pathnames from the user or the system.  S-expressions
as external syntax would still have to be validated, plus I don't
think it would be the choice for user interfaces.

XML's surface
syntax is also standard, universal and familiar. Would you suggest that
XML data in Scheme code be therefore expressed with strings:
"<foo>bar<baz/></foo>" instead of, say, Xexprs: (foo "bar" (baz))?

The latter, with one caveat: In Kawa, XML data are represented with
special types, and I think this is needed to best match the XML data
model.  (Namespaces are one factor.)  What happens in Kawa is that:
  (foo "bar" (baz))
*evaluates* to XML data, but it isn't XML data in itself.
(It depends what you're trying to do whether this distinction is
worthwhile, of course.)
	--Per Bothner
per@xxxxxxxxxxx   http://per.bothner.com/