[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: performance



Taylor R Campbell wrote:
   Date: Thu, 17 Sep 2009 20:53:55 -0400
   From: David Van Horn <dvanhorn@xxxxxxxxxxxxxxx>

   Taylor R Campbell wrote:
   > What space usage are random-access lists guaranteed to exhibit?
   > Growth order and constant factors are both interesting -- growth order
   > to specify in the SRFI, and constant factors to satisfy my curiosity.

Representing a random-access list of n elements takes O(n) space, just as with sequential lists.

I ought to have been more precise: while of course it takes O(n) space
to store a sequence of n elements, I was interested more in the amount
of extra space.  E.g., in most systems, a vector of length n will use
a constant amount of extra space (a header with its length at the
beginning), while a list of length n will use O(n) extra space (one
car for each element, plus one cdr extra space for each element, and
sometimes another word for a header or similar).

Yes, hence the following paragraph which gives the number of cons cells need to represent a list of size n.

A random-access list is a forest of complete binary trees. The forest contains at most log n trees, and the space of each tree is proportional to the number of elements it contains. If you represent binary trees using pairs [*], then it takes m-1 pairs to represent a complete binary tree with m elements. The forest can be represented as a list, and thus takes a pair per tree, which is at most log n. So the overall space consumption need not be more than n + log n. My implementation stores the size of each tree in the forest making it n + 2 * log n.

I'm a little puzzled by the formulae you've computed here.  If my
cursory examination of the reference implementation is right, the
trees have data associated with each internal node, not with the
leaves.  So you need one location for each element, one location for
each left branch, and one location for each right branch, for a total
of 2 n extra locations in each tree, plus whatever constant overhead
each tree has: + 1 for the size, + 1 for the node, + 1 for the link to
the next tree?  This sounds like a total space usage of 3 n + 3 log n,
neglecting the overhead of records, which means an extra space usage
of 2 n + 3 log n = O(n).  I suppose I ought to have expected that,
though.

The formula is given in terms of the number of pairs needed. I figured you could extrapolate from that. Are you asking for something else?

You need not use records in an implementation, but there is no portable way to define a pair type that is distinct from the built-in pair type. So if there is overhead for records, it can be avoided.

You are correct that the binary trees have data associated at each node of the tree not just the leaves. A tree of height 1 (a leaf) has no overhead. A tree of height 2 has 3 elements and requires 2 cons cells. A tree of height 3 has 7 elements and requires 6 cons cells. In general a tree of height h has (- (expt 2 h) 1) elements and requires (- (expt 2 h) 2) cons cells. Additionally, you need a list for the forest and so a cons cell per tree, and my implementation stores the size, so you need another cons cell per tree. There are at most log n trees. You can trade space for time in the case of the latter and compute the size within the time bounds required in the SRFI.

In short, the space consumption has at worst a logarithmic overhead compared to a plain ol' list. Right? I'm still not sure I'm answering the question you're asking.

David