[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: #\a octothorpe syntax vs SRFI 10
Aubrey Jaffer wrote:
>>> Modern CPUs are almost always (I/O-bound or) limited by their memory
>>> bandwidth through the cache. If you double or quadruple the data
>>> movement necessary, you will execute at half or quarter the speed.
>> That depends on your data access patterns and cache sizes. If your
>> working set still fits in L1 cache after aligning the data, you get
>> better performance, and some of the big servers on the market now
>> have huge amounts of L1 cache.
> The largest I found was 32.kiB inst + 64.kiB data on Suns
> A stripped down SCM interpreter is 40.kB, but even if the interpreter
> fit, it would be getting swapped out to bring in SUBRs. 64.kiB would
> be a much better fit.
I misremembered slightly; I was actually thinking of the 256KB L2 cache
on the Itanium 2, which Intel describes as similar in performance to
most L1 caches. Here are the stats for its 3 cache levels:
Cache Size Load latency
L1 32KB 1 cycle direct, 2 cycles indirect
L2 256KB ~6 cycles
L3 1.5-9MB 12+ cycles
While I wouldn't say that ~6 cycles is comparable to most L1 caches,
it's still very fast. Put that cache in a system where byte access
requires a word load, mask, and shift (about 3-5 cycles), and you might
just break even. You'd trade about 2-3 cycles per load for a greater
number of 6-cycle L2 loads, just enough where it starts looking viable.
Almost forgot -- eliminating the mask & shift operations also reduces
pressure on the instruction cache, for more savings.
>> In some architectures, byte-aligned access may even be more
>> expensive than L2 cache.
> That sounds like a poorly designed CPU.
Or a big, fast L2 cache like the Itanium's.
> Disk-based b-trees, used extensively for database indexes, are an
> important example of byte-intensive algorithms not tied to text
> strings. Other examples are cryptography and data-compression.
Thanks; I didn't know that.
Bradd W. Szonye