On Wed, Nov 26, 2008 at 11:22 PM, Elad Lahav <elad_lahav@xxxxxxxxxxxxxxxxxxxxx> wrote:
I am looking into some scalability issues on a 4-way Xeon machine (4 separate CPUs, not cores). I believe I have tracked down the problem to bus contention: OProfile results suggest a strong correlation between instructions reporting a high number of global_power_events and FSB_data_activity events.
Some of these events can be easily explained. However, what surprises me, is that certain lock operations seem to cause considerable lock activity. For example, a call to spin_unlock_irqsave() from e1000_xmit_frame(). The strange thing about it is that the experiments I am conducting strictly partition the NICs among the CPUs (interrupt and process affinity), so that there is no contention on the lock (verified with lockstat).
My understanding suggests that the variable of a lock that is only accessed by a single CPU should be constantly in the CPU's cache in Modified mode, as no other CPU is ever invalidating it, and thus there should be little if any FSB activity due to access to this variable. It should be noted that the number of FSB events per-cpu increases considerably when moving from 3 to 4 CPUs, while the number of cache misses stays roughly the same.
Is my understanding correct? Are there any other reasons for FSB activity related to locks?
--Elad
--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ