On 12/20/2011 04:38 PM, Srivatsa S. Bhat wrote: > On 12/20/2011 04:06 PM, Srivatsa S. Bhat wrote: > >> On 12/20/2011 03:07 PM, mengcong wrote: >> >>> On Tue, 2011-12-20 at 12:58 +0530, Srivatsa S. Bhat wrote: >>>> On 12/20/2011 11:57 AM, Al Viro wrote: >>>> >>>>> On Tue, Dec 20, 2011 at 10:26:05AM +0530, Srivatsa S. Bhat wrote: >>>>>> Oh, right, that has to be handled as well... >>>>>> >>>>>> Hmmm... How about registering a CPU hotplug notifier callback during lock init >>>>>> time, and then for every cpu that gets onlined (after we took a copy of the >>>>>> cpu_online_mask to work with), we see if that cpu is different from the ones >>>>>> we have already locked, and if it is, we lock it in the callback handler and >>>>>> update the locked_cpu_mask appropriately (so that we release the locks properly >>>>>> during the unlock operation). >>>>>> >>>>>> Handling the newly introduced race between the callback handler and lock-unlock >>>>>> code must not be difficult, I believe.. >>>>>> >>>>>> Any loopholes in this approach? Or is the additional complexity just not worth >>>>>> it here? >>>>> >>>>> To summarize the modified variant of that approach hashed out on IRC: >>>>> >>>>> * lglock grows three extra things: spinlock, cpu bitmap and cpu hotplug >>>>> notifier. >>>>> * foo_global_lock_online starts with grabbing that spinlock and >>>>> loops over the cpus in that bitmap. >>>>> * foo_global_unlock_online loops over the same bitmap and then drops >>>>> that spinlock >>>>> * callback of the notifier is going to do all bitmap updates. Under >>>>> that spinlock. Events that need handling definitely include the things like >>>>> "was going up but failed", since we need the bitmap to contain all online CPUs >>>>> at all time, preferably without too much junk beyond that. IOW, we need to add >>>>> it there _before_ low-level __cpu_up() calls set_cpu_online(). Which means >>>>> that we want to clean up on failed attempt to up it. Taking a CPU down is >>>>> probably less PITA; just clear bit on the final "the sucker's dead" event. >>>>> * bitmap is initialized once, at the same time we set the notifier >>>>> up. Just grab the spinlock and do >>>>> for_each_online_cpu(N) >>>>> add N to bitmap >>>>> then release the spinlock and let the callbacks handle all updates. >>>>> >>>>> I think that'll work with relatively little pain, but I'm not familiar enough >>>>> with the cpuhotplug notifiers, so I'd rather have the folks familiar with those >>>>> to supply the set of events to watch for... >>>>> >>>> >>>> >>>> We need not watch out for "up failed" events. It is enough if we handle >>>> CPU_ONLINE and CPU_DEAD events. Because, these 2 events are triggered only >>>> upon successful online or offline operation, and these notifications are >>>> more than enough for our purpose (to update our bitmaps). Also, those cpus >>>> which came online wont start running until these "success notifications" >>>> are all done, which is where we do our stuff in the callback (ie., try >>>> grabbing the spinlock..). >>>> >>>> Of course, by doing this (only looking out for CPU_ONLINE and CPU_DEAD >>>> events), our bitmap will probably be one step behind cpu_online_mask >>>> (which means, we'll still have to take the snapshot of cpu_online_mask and >>>> work with it instead of using for_each_online_cpu()). >>>> But that doesn't matter, as long as: >>>> * we don't allow the newly onlined CPU to start executing code (this >>>> is achieved by taking the spinlock in the callback) >>> >>> I think cpu notifier callback doesn't always run on the UPing cpu. >>> Actually, it rarely runs on the UPing cpu. >>> If I was wrong about the above thought, there is still a chance that lg-lock >>> operations are scheduled on the UPing cpu before calling the callback. >>> >> >> >> I wasn't actually banking on that, but you have raised a very good point. >> The scheduler uses its own set of cpu hotplug callback handlers to start >> using the newly added cpu (see the set of callbacks in kernel/sched.c) >> >> So, now we have a race between our callback and the scheduler's callbacks. >> ("Placing" our callback appropriately in a safe position using priority >> for notifiers doesn't appeal to me that much, since it looks like too much >> hackery. It should probably be our last resort). >> > > > Hey, actually there is a simple solution: just nip it (or rather delay it) > in the bud ;) That is, we watch out for CPU_UP_PREPARE event and lock it > up there itself using our spinlock.. that way, that cpu will not come up > until we are done executing br_write_unlock(). In fact, we can even fail > the onlining of that cpu by returning appropriate value from our callback, > but that would be too harsh.. so we can settle for delaying the cpu online > operation :) > > And we do the same thing for CPU_DOWN_PREPARE event. And we can forget about > taking snapshot of cpu_online_mask, since cpu_online_mask is guaranteed to > be unmodified until we are done with br_write_unlock(). > Had to modify this a bit, to handle a race while using cpu_online_mask. Anyway, here is the code: The idea here is that the notifier will grab the spinlock and not release it until the entire cpu hotplug operation is complete. Hence the pairs CPU_UP_PREPARE <-> CPU_UP_CANCELED, CPU_ONLINE and CPU_DOWN_PREPARE <-> CPU_DOWN_FAILED, CPU_DEAD However, if the notifier couldn't acquire the spinlock, it will keep spinning at the CPU_UP_PREPARE or CPU_DOWN_PREPARE event, thereby preventing cpu hotplug, as well as any updates to cpu_online_mask. Hence, taking snapshot of cpu_online_mask becomes unnecessary. [Actually I have another approach, using CPU_STARTING and CPU_DYING events, which addresses Cong Meng's concern in a more direct manner, but I would rather post that patch after sometime, to avoid the risk of confusing everyone. Moreover, that approach is not a "clear winner", so it can wait until the below patch gets reviewed]. --- include/linux/lglock.h | 39 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 39 insertions(+), 0 deletions(-) diff --git a/include/linux/lglock.h b/include/linux/lglock.h index f549056..71079b1 100644 --- a/include/linux/lglock.h +++ b/include/linux/lglock.h @@ -22,6 +22,7 @@ #include <linux/spinlock.h> #include <linux/lockdep.h> #include <linux/percpu.h> +#include <linux/cpu.h> /* can make br locks by using local lock for read side, global lock for write */ #define br_lock_init(name) name##_lock_init() @@ -75,6 +76,41 @@ DEFINE_PER_CPU(arch_spinlock_t, name##_lock); \ DEFINE_LGLOCK_LOCKDEP(name); \ \ +DEFINE_SPINLOCK(name##_notifier_lock); \ + \ +static int \ +name##_lg_cpu_callback(struct notifier_block *nb, \ + unsigned long action, void *hcpu) \ +{ \ + switch (action & ~CPU_TASKS_FROZEN) { \ + case CPU_UP_PREPARE: \ + spin_lock(&name##_notifier_lock); \ + break; \ + \ + case CPU_UP_CANCELED: \ + case CPU_ONLINE: \ + spin_unlock(&name##_notifier_lock); \ + break; \ + \ + case CPU_DOWN_PREPARE: \ + spin_lock(&name##_notifier_lock); \ + break; \ + \ + case CPU_DOWN_FAILED: \ + case CPU_DEAD: \ + spin_unlock(&name##_notifier_lock); \ + break; \ + \ + default: \ + return NOTIFY_DONE; \ + } \ + return NOTIFY_OK; \ +} \ + \ +static struct notifier_block name##_lg_cpu_notifier = { \ + .notifier_call = name##_lg_cpu_callback, \ +}; \ + \ void name##_lock_init(void) { \ int i; \ LOCKDEP_INIT_MAP(&name##_lock_dep_map, #name, &name##_lock_key, 0); \ @@ -83,6 +119,7 @@ lock = &per_cpu(name##_lock, i); \ *lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; \ } \ + register_hotcpu_notifier(&name##_lg_cpu_notifier); \ } \ EXPORT_SYMBOL(name##_lock_init); \ \ @@ -125,6 +162,7 @@ void name##_global_lock_online(void) { \ int i; \ preempt_disable(); \ + spin_lock(&name##_notifier_lock); \ rwlock_acquire(&name##_lock_dep_map, 0, 0, _RET_IP_); \ for_each_online_cpu(i) { \ arch_spinlock_t *lock; \ @@ -142,6 +180,7 @@ lock = &per_cpu(name##_lock, i); \ arch_spin_unlock(lock); \ } \ + spin_unlock(&name##_notifier_lock); \ preempt_enable(); \ } \ EXPORT_SYMBOL(name##_global_unlock_online); \ Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html