On 02/26/2013 07:47 PM, Lai Jiangshan wrote: > On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat > <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> wrote: >> Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many >> lock-ordering related problems (unlike per-cpu locks). However, global >> rwlocks lead to unnecessary cache-line bouncing even when there are no >> writers present, which can slow down the system needlessly. > > per-CPU rwlocks(yours and mine) are the exactly same as rwlock_t in > the view of lock dependency(except reader-C.S. can be nested in > writer-C.S.) > so they can deadlock in this order: > > spin_lock(some_lock); percpu_write_lock_irqsave() > case CPU_DYING > percpu_read_lock_irqsafe(); <---deadlock---> spin_lock(some_lock); > Yes, of course! But this is the most straight-forward of cases to fix, because there are a well-defined number of CPU_DYING notifiers, and the fix is to make the lock ordering same at both sides. The real challenge is with cases like below: spin_lock(some_lock) percpu_read_lock_irqsafe() percpu_read_lock_irqsafe() spin_lock(some_lock) The locking primitives (percpu_read_lock/unlock_irqsafe) have been explicitly designed to keep cases like the above deadlock-free. Because, they ease conversions from preempt_disable() to the new locking primitive without lock-ordering headache. > The lockdep can find out such dependency, but we must try our best to > find out them before merge the patchset to mainline. We can review > all the code of cpu_disable() and CPU_DYING and fix this kinds of lock > dependency, but it is not easy thing, it may be a long term project. > :-) That's exactly what I have done in this patchset, and that's why there are 46 patches in this series! :-) I have run this patchset with lockdep turned on, and verified that there are no locking problems due to the conversion. I even posted out performance numbers from this patchset (ie., in comparison to stop_machine), if you are curious... http://article.gmane.org/gmane.linux.kernel/1435249 But yes, I'll have to re-verify this because of the new code that went in during this merge window. > ====== > And if there is any CPU_DYING code takes no locks and do some > works(because they know they are called via stop_machine()) we need to > add that locking code back if there is such code.(I don't know whether > such code exist or not) > Yes, I explicitly verified this too. (I had mentioned this in the cover letter). Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html