On 12/19/2011 02:42 PM, Stephen Boyd wrote: > On 12/18/2011 11:31 PM, Srivatsa S. Bhat wrote: >> Hi, >> >> I feel the following patch is a better fix for 2 reasons: >> >> 1. As Al Viro pointed out, if we do for_each_possible_cpus() then we >> might >> encounter unnecessary performance hit in some scenarios. So working with >> only online cpus, safely(a.k.a race-free), if possible, would be a good >> solution (which this patch implements). >> >> 2. *_global_lock_online() and *_global_unlock_online() needs fixing as >> well >> because, the names suggest that they lock/unlock per-CPU locks of only >> the >> currently online CPUs, but unfortunately they do not have any >> synchronization >> to prevent offlining those CPUs in between, if it happens to race with >> a CPU >> hotplug operation. >> >> And if we solve issue 2 above "carefully" (as mentioned in the >> changelog below), >> it solves this whole thing! > > We started seeing this same problem last week. I've come up with almost > the same solution but you beat me to the list! > Oh :-) >> diff --git a/include/linux/lglock.h b/include/linux/lglock.h >> index f549056..583d1a8 100644 >> --- a/include/linux/lglock.h >> +++ b/include/linux/lglock.h >> @@ -126,6 +127,7 @@ >> int i; \ >> preempt_disable(); \ >> rwlock_acquire(&name##_lock_dep_map, 0, 0, _RET_IP_); \ >> + get_online_cpus(); \ >> for_each_online_cpu(i) { \ >> arch_spinlock_t *lock; \ >> lock =&per_cpu(name##_lock, i); \ >> @@ -142,6 +144,7 @@ >> lock =&per_cpu(name##_lock, i); \ >> arch_spin_unlock(lock); \ >> } \ >> + put_online_cpus(); \ >> preempt_enable(); \ >> } \ >> EXPORT_SYMBOL(name##_global_unlock_online); \ > > Don't you want to call {get,put}_online_cpus() outside the > preempt_{disable,enable}()? Otherwise you are scheduling while atomic? > Oh right, thanks for spotting this! (and thanks for your Ack too!) > With that fixed > > Acked-by: Stephen Boyd <sboyd@xxxxxxxxxxxxxx> > > but I wonder if taking the hotplug mutex even for a short time reduces > the effectiveness of these locks? Or is it more about fast readers and > slow writers? > IMHO, we don't need to be concerned here because, {get,put}_online_cpus() implement a refcounting solution, and they don't really serialize stuff unnecessarily. The readers (those who prevent cpu hotplug, such as this lock- unlock code) are fast and can be concurrent, while the writers (the task that is doing the cpu hotplug) waits till all existing readers are gone/done with their work. So, since we are readers here, IMO, we don't have to worry about performance. (I know that we get serialized just for a moment while incrementing the refcount, but that should not be worrisome right?) Moreover, using for_each_online_cpu() without using {get,put}_online_cpus() around that, is plain wrong, because of the unhandled race with cpu hotplug. IOW, our primary concern here is functionality, isn't it? To summarize, in the current design of these VFS locks, using {get,put}_online_cpus() is *essential* to fix a functionality-related bug, (and not so bad performance-wise as well). The following patch (v2) incorporates your comments: --- From: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> Subject: [PATCH v2] VFS: Fix race between CPU hotplug and *_global_[un]lock_online() The *_global_[un]lock_online() macros defined in include/linux/lglock.h can race with CPU hotplug operations. Fix this race by using the pair get_online_cpus() and put_online_cpus() around them, so as to prevent CPU hotplug happening at the same time. But be careful to note the semantics here: if we enable CPU hotplug in-between a lock_online() and the corresponding unlock_online(), the lock states can get messed up, since we might end up for example, in a situation such as taking a lock on an online CPU but not releasing it because that CPU was offline when unlock_online() was invoked (thanks to Cong Meng for debugging this issue). [Soft-lockups detected as a consequence of this issue was reported earlier in https://lkml.org/lkml/2011/8/24/185.] So, we are careful to allow CPU hotplug only after the lock-unlock sequence is complete. v2: Moved {get,put}_online_cpus() outside of preempt_{disable,enable} to avoid scheduling while atomic. Debugged-by: Cong Meng <mc@xxxxxxxxxxxxxxxxxx> Acked-by: Stephen Boyd <sboyd@xxxxxxxxxxxxxx> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> --- include/linux/lglock.h | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/linux/lglock.h b/include/linux/lglock.h index f549056..7ef257d 100644 --- a/include/linux/lglock.h +++ b/include/linux/lglock.h @@ -22,6 +22,7 @@ #include <linux/spinlock.h> #include <linux/lockdep.h> #include <linux/percpu.h> +#include <linux/cpu.h> /* can make br locks by using local lock for read side, global lock for write */ #define br_lock_init(name) name##_lock_init() @@ -124,6 +125,7 @@ \ void name##_global_lock_online(void) { \ int i; \ + get_online_cpus(); \ preempt_disable(); \ rwlock_acquire(&name##_lock_dep_map, 0, 0, _RET_IP_); \ for_each_online_cpu(i) { \ @@ -143,6 +145,7 @@ arch_spin_unlock(lock); \ } \ preempt_enable(); \ + put_online_cpus(); \ } \ EXPORT_SYMBOL(name##_global_unlock_online); \ \ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html