Re: Matrix multiplication: performance drop

John Fine <johnsfine@xxxxxxxxxxx> · Tue, 03 Mar 2009 11:44:52 -0500

L2 cache is associative.  The exact design of that associativity creates 
a power of two, such that stepping through memory with a step size of 
that power of two (or a nearby power of two) will cause far more cache 
misses than a non power of two step size.

Yury Serdyuk wrote:

why an unoptimized code works fine, say, for N = 2100 ,
but doesn't work for N = 2048, or, in general, for N multiply of 512?
What is a magic number ?