I am looking into some scalability issues on a 4-way Xeon machine (4 separate CPUs, not
cores). I believe I have tracked down the problem to bus contention: OProfile results
suggest a strong correlation between instructions reporting a high number of
global_power_events and FSB_data_activity events.
Some of these events can be easily explained. However, what surprises me, is that certain
lock operations seem to cause considerable lock activity. For example, a call to
spin_unlock_irqsave() from e1000_xmit_frame(). The strange thing about it is that the
experiments I am conducting strictly partition the NICs among the CPUs (interrupt and
process affinity), so that there is no contention on the lock (verified with lockstat).
My understanding suggests that the variable of a lock that is only accessed by a single
CPU should be constantly in the CPU's cache in Modified mode, as no other CPU is ever
invalidating it, and thus there should be little if any FSB activity due to access to this
variable. It should be noted that the number of FSB events per-cpu increases considerably
when moving from 3 to 4 CPUs, while the number of cache misses stays roughly the same.
Is my understanding correct? Are there any other reasons for FSB activity related to locks?
--Elad
--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ