[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: #\a octothorpe syntax vs SRFI 10

This page is part of the web mail archives of SRFI 58 from before July 7th, 2015. The new archives for SRFI 58 contain all messages, not just those from before July 7th, 2015.

To: srfi-58@xxxxxxxxxxxxxxxxx
Subject: Re: #\a octothorpe syntax vs SRFI 10
From: "Bradd W. Szonye" <bradd+srfi@xxxxxxxxxx>
Date: Wed, 5 Jan 2005 04:44:19 -0800
Delivered-to: srfi-58@xxxxxxxxxxxxxxxxx
In-reply-to: <20041231013816.D56251B7711@xxxxxxxxxxxxxxxx>
Mail-followup-to: srfi-58@xxxxxxxxxxxxxxxxx
References: <Pine.LNX.4.44.0412262207080.10074-100000@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <Pine.LNX.4.58.0412270956290.31229@xxxxxxxxxxxxxx> <20041229044152.397161B7710@xxxxxxxxxxxxxxxx> <20041229221404.GB6850@xxxxxxxxxxxxxxx> <20041231013816.D56251B7711@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.4.1i

Aubrey Jaffer wrote:
>>> Modern CPUs are almost always (I/O-bound or) limited by their memory
>>> bandwidth through the cache.  If you double or quadruple the data
>>> movement necessary, you will execute at half or quarter the speed.

Bradd wrote:
>> That depends on your data access patterns and cache sizes.  If your
>> working set still fits in L1 cache after aligning the data, you get
>> better performance, and some of the big servers on the market now
>> have huge amounts of L1 cache.

> The largest I found was 32.kiB inst + 64.kiB data on Suns
> <http://www.sun.com/servers/family-comp.html>.
> A stripped down SCM interpreter is 40.kB, but even if the interpreter
> fit, it would be getting swapped out to bring in SUBRs.  64.kiB would
> be a much better fit.

I misremembered slightly; I was actually thinking of the 256KB L2 cache
on the Itanium 2, which Intel describes as similar in performance to
most L1 caches. Here are the stats for its 3 cache levels:

    Cache   Size      Load latency
    L1      32KB      1 cycle direct, 2 cycles indirect
    L2      256KB     ~6 cycles
    L3      1.5-9MB   12+ cycles

While I wouldn't say that ~6 cycles is comparable to most L1 caches,
it's still very fast. Put that cache in a system where byte access
requires a word load, mask, and shift (about 3-5 cycles), and you might
just break even. You'd trade about 2-3 cycles per load for a greater
number of 6-cycle L2 loads, just enough where it starts looking viable.
Almost forgot -- eliminating the mask & shift operations also reduces
pressure on the instruction cache, for more savings.

>> In some architectures, byte-aligned access may even be more
>> expensive than L2 cache.

> That sounds like a poorly designed CPU.

Or a big, fast L2 cache like the Itanium's.

> Disk-based b-trees, used extensively for database indexes, are an
> important example of byte-intensive algorithms not tied to text
> strings.  Other examples are cryptography and data-compression.

Thanks; I didn't know that.
-- 
Bradd W. Szonye
http://www.szonye.com/bradd

References:
- Re: #\a octothorpe syntax vs SRFI 10
  - From: campbell
- Re: #\a octothorpe syntax vs SRFI 10
  - From: bear
- Re: #\a octothorpe syntax vs SRFI 10
  - From: Aubrey Jaffer
- Re: #\a octothorpe syntax vs SRFI 10
  - From: Bradd W. Szonye
- Re: #\a octothorpe syntax vs SRFI 10
  - From: Aubrey Jaffer

Prev by Date: Re: Floating-point formats and standards
Next by Date: Re: #\a octothorpe syntax vs SRFI 10
Previous by thread: Re: #\a octothorpe syntax vs SRFI 10
Next by thread: Re: #\a octothorpe syntax vs SRFI 10
Index(es):
- Date
- Thread