[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: #\a octothorpe syntax vs SRFI 10

This page is part of the web mail archives of SRFI 58 from before July 7th, 2015. The new archives for SRFI 58 contain all messages, not just those from before July 7th, 2015.



Bear wrote:
>> Many recent RISC processors have no 8-bit operations.  Some in the
>> fairly near future will probably also lack 16-bit operations.  It
>> would be far more efficient for these systems to allocate 16 bits
>> where an 'au8' is requested;

Aubrey Jaffer wrote:
> No, it won't.  Modern CPUs are almost always (I/O-bound or) limited by
> their memory bandwidth through the cache.  If you double or quadruple
> the data movement necessary, you will execute at half or quarter the
> speed.

That depends on your data access patterns and cache sizes. If your
working set still fits in L1 cache after aligning the data, you get
better performance, and some of the big servers on the market now have
huge amounts of L1 cache. In some architectures, byte-aligned access may
even be more expensive than L2 cache.

However, still a good practical argument against word-aligning byte
arrays. Text strings are the main application for byte arrays. If you're
worried about the performance of byte strings, and you have a big
enough, fast enough cache, you could word-align the characters. But why
do that when you could use a text encoding /designed/ for word
alignment?

For an extreme example, it makes no sense at all to word-align UTF-8
text; UTF-32 is simpler and more compact. The only reason to use UTF-8
at all is to conserve space. Likewise, it makes little sense to word-
align the Asian "shift" encodings like the EUC character sets. Again,
UTF-32 is simpler and more compact, and it's also more general. In
general, word alignment doesn't make sense for "multibyte" character
sets. If you're going to spend whole words per text element, you should
use an encoding that doesn't waste most of the bits.

That's somewhat less true for single-byte encodings like ASCII and the
ISO 8859 "Latin" series. If you really only need 256 characters, full
Unicode support brings in some baggage that you may not want. I can
imagine applications that do better with word-aligned ASCII than with
UTF-32, but I expect that such applications are very rare, and therefore
not a good reason to word-align byte arrays in general.
-- 
Bradd W. Szonye
http://www.szonye.com/bradd