[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Storage Efficiency of Vectors

This page is part of the web mail archives of SRFI 43 from before July 7th, 2015. The new archives for SRFI 43 contain all messages, not just those from before July 7th, 2015.



Hello,

Points have been made regarding the storage efficiency of vectors. I've
got a couple of comments.

First, I'm not sure that association vectors will save storage. Unless
I'm very much mistaken, a typical implementation requires an overhead of
two words for a pair and N+1 words for a vector of length N (with the
additional word storing the length). Thus, an association list has four
words of overhead per key, two in the spine and two in the rib, and an
association vector also has (asymptotically) four words of overhead per
key, one in the spine and three in the rib. If you're really bothered
about the storage overhead, a "property vector", with alternating keys
and values, or simply one vector for the keys and another for the values
are probably better ideas. Both of these have (asymptotically) two words
of overhead per key.

I'm also not sure that "stretchy" vectors will help much in the common
case that a vector of an initially unknown size is constructed and then
its size is not changed. Let's consider constructing a vector of length
M = 1.5 * 2^N. I've chosen this length as a median case appropriate to
constructing a vector by creating a fresh vector of twice the previous
size when the previous vector becomes full. If we construct a list first
and then create an appropriate vector, we will allocate 3M words, 2M for
the list and M for the vector. (If we make the mistake of using "(apply
vector list)", we require 5M!) If we construct the vector by repeated
doubling, we will allocate (1+1) + (2+1) + (4+1) + ... + (2^(N+1)+1) =
2^(N+2) + N + 1 = (8/3) M words (ignoring the logarithmic term). So,
unless I'm making some mistake here, it would appear that in the median
case stretchy vectors save us about 10%.

For smaller than median cases, lists are slightly better and for larger
than median cases, stretchy vectors are slightly better. However, even
in the best case for stretchy vectors, they will allocate only 1/3 less
storage than lists (2M versus 3M), and in the worst case they will
allocate 1/3 more (4M versus 3M). So, to some degree which is better
comes down to the distribution of vector lengths. However, we're talking
about factors of less than two.

Of course, stretchy vectors are an enormous convenience and will help if
the vector is subsequently resized. Furthermore a clever allocator that
can grow a vector in place will change my analysis. But in this common
case and for a naive allocator, I would venture to paraphrase David
Thornley and say that lists don't look any deader than usual to me.

Regards,

Alan Watson
-- 
Dr Alan Watson
Instituto de Astronomía UNAM