[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: another operation




Hello,

>  | Concerning complexity, you are right: Log2-binary-factors needs
>  | time O(n) for an n-bit argument, whether in Scheme or on the bare
>  | metal.
>
> Iterating from the LSB, if the density of ones is d, then the
> probability of k zero LSBs is d^k.  The average number of iterations
> for long numbers would be O(1/log(d)).  So it is faster than linear!


Well, nearly. Please let me correct:

1. You assume i.i.d. random bits with density of ones d, where 0 < d < 1.
Then the number K_b of bits to inspect until the first '1' is found
is geometrically distributed, in this case Pr{K_b = k} = (1 - d)^(k-1) d, for k >= 1.
The expected number of bits inspected is E{K_b} = 1/d.

2. Now consider words of w bits each, and again the bits are i.i.d.
random variables with density of ones d, where 0 < d < 1.
Then the probability that a given word is non-zero (any bit is '1') is p = 1 - (1 - d)^w.
The number K_w of words to inspect until the first non-zero word is found
is again geometrically distributed, this time Pr{K_w = k} = (1 - p)^(k-1) p, k >= 1.
The expected number of words inspected is E{K_w} = 1/p, which tends
to 1/(w d) as d tends to zero and w is a constant.

This tells us a few things:
a) For i.i.d. random bits the runtime for long strings only depends on the
density of ones (no surprise); in this sense it is "faster than linear".
b) Processing w bits in parallel gives about factor w speed-up.
c) The "1/log(d)" you mention is not correct.

The problem with the "fast than linear" thing is that it only holds if the bits
are i.i.d. random. That is hard to justify in most applications. (In fact,
I wouldn't go near it.)

Anyhow, in the worst-case (numbers of the form 2^k, k >= 0) the algorithm inspects
all bits, but most of the time I would like to think of it as being constant time.

> I grepped through gambc30 and gambc40b12 without finding
> first-bit-set.

Oops, I had simply scanned the documentation for the name and found it at:

        gambc40b12.tar.gz > gambit-c.pdf > p. 44

(Remember: "If all else has failed, read the manual.")


Ah! I wrote FIRST-BIT-SET but Gambit really calls it FIRST-SET-BIT.

> How about COUNT-BINARY-FACTORS?

I still like FIRST-BIT-SET best, or something as LSB-BIT-SET.
Actually, I do not care too much about the name.

See you,

Sebastian.

----
Dr. Sebastian Egner
Senior Scientist
Philips Research Laboratories
Prof. Holstlaan 4 (postbox WY63, kamer WY 6-55)
5656 AA Eindhoven
The Netherlands
tel:       +31 40 27-42069
fax:      +31 40 27-42566
email: sebastian.egner@xxxxxxxxxxx