Re: Locks and the FSB

Michael Blizek <michi1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 27 Nov 2008 21:38:03 +0100

Hi!

On 17:22 Wed 26 Nov     , Elad Lahav wrote:
> I am looking into some scalability issues on a 4-way Xeon machine (4 
> separate CPUs, not cores). I believe I have tracked down the problem to 
> bus contention: OProfile results suggest a strong correlation between 
> instructions reporting a high number of global_power_events and 
> FSB_data_activity events.

FSB contention can be a serious bottleneck, if you have 4 Xeons sharing the
same FSB. But I guess that the correlation between global_power_events and
FSB_data_activity alone does not say much about the contention. 4 idle CPUs
will not content the bus very easily.

> Some of these events can be easily explained. However, what surprises me, 
> is that certain lock operations seem to cause considerable lock activity. 
> For example, a call to spin_unlock_irqsave() from e1000_xmit_frame(). The 
> strange thing about it is that the experiments I am conducting strictly 
> partition the NICs among the CPUs (interrupt and process affinity), so 
> that there is no contention on the lock (verified with lockstat).

The interrupts apply only to the receive side. Any CPU may put data into the
qdisc and I think any CPU may take data of the qdisc and send it. Have you set
the process affinity so that the sending process runs on the same CPU the
interrupt is raised on?

> My understanding suggests that the variable of a lock that is only 
> accessed by a single CPU should be constantly in the CPU's cache in 
> Modified mode, as no other CPU is ever invalidating it, and thus there 
> should be little if any FSB activity due to access to this variable.

A cache line may contain more than a single spinlock. If any other data in
the same cache line is accessed or modified, the cache line will not be in
modified state.

> It 
> should be noted that the number of FSB events per-cpu increases 
> considerably when moving from 3 to 4 CPUs,

This is strange. Maybe it increased from "very low" to "low" by some other
code? But this does not explain the long spin_unlock_irqsave()es.

> while the number of cache 
> misses stays roughly the same.

This and the lockstat seem tells that there is no lock contention. Can you
send some oprofile data?
	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ