Hi! On 17:22 Wed 26 Nov , Elad Lahav wrote: > I am looking into some scalability issues on a 4-way Xeon machine (4 > separate CPUs, not cores). I believe I have tracked down the problem to > bus contention: OProfile results suggest a strong correlation between > instructions reporting a high number of global_power_events and > FSB_data_activity events. FSB contention can be a serious bottleneck, if you have 4 Xeons sharing the same FSB. But I guess that the correlation between global_power_events and FSB_data_activity alone does not say much about the contention. 4 idle CPUs will not content the bus very easily. > Some of these events can be easily explained. However, what surprises me, > is that certain lock operations seem to cause considerable lock activity. > For example, a call to spin_unlock_irqsave() from e1000_xmit_frame(). The > strange thing about it is that the experiments I am conducting strictly > partition the NICs among the CPUs (interrupt and process affinity), so > that there is no contention on the lock (verified with lockstat). The interrupts apply only to the receive side. Any CPU may put data into the qdisc and I think any CPU may take data of the qdisc and send it. Have you set the process affinity so that the sending process runs on the same CPU the interrupt is raised on? > My understanding suggests that the variable of a lock that is only > accessed by a single CPU should be constantly in the CPU's cache in > Modified mode, as no other CPU is ever invalidating it, and thus there > should be little if any FSB activity due to access to this variable. A cache line may contain more than a single spinlock. If any other data in the same cache line is accessed or modified, the cache line will not be in modified state. > It > should be noted that the number of FSB events per-cpu increases > considerably when moving from 3 to 4 CPUs, This is strange. Maybe it increased from "very low" to "low" by some other code? But this does not explain the long spin_unlock_irqsave()es. > while the number of cache > misses stays roughly the same. This and the lockstat seem tells that there is no lock contention. Can you send some oprofile data? -Michi -- programing a layer 3+4 network protocol for mesh networks see http://michaelblizek.twilightparadox.com -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ