On 10/03/2012 11:38 AM, Srivatsa S. Bhat wrote: > On 10/03/2012 09:20 AM, Srivatsa S. Bhat wrote: >> On 10/03/2012 06:15 AM, Jiri Kosina wrote: >>> On Tue, 2 Oct 2012, Paul E. McKenney wrote: >>> >>>> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote: >>>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote: >>>>> >>>>>> Indeed. Slab seems to be doing an rcu_barrier() in a CPU hotplug >>>>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude >>>>>> CPU hotplug events. I could go back to the old approach, but it is >>>>>> significantly more complex. I cannot say that I am all that happy about >>>>>> anyone calling rcu_barrier() from a CPU hotplug notifier because it >>>>>> doesn't help CPU hotplug latency, but that is a separate issue. >>>>>> >>>>>> But the thing is that rcu_barrier()'s assumptions work just fine if either >>>>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug >>>>>> notifier. You see, either way, the CPU cannot go away while rcu_barrier() >>>>>> is executing. So the right way to resolve this seems to be to do the >>>>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context >>>>>> of a hotplug notifier. Should be fixable without too much hassle... >>>>> >>>>> Sorry, I don't think I understand what you are proposing just yet. >>>>> >>>>> If I understand it correctly, you are proposing to introduce some magic >>>>> into _rcu_barrier() such as (pseudocode of course): >>>>> >>>>> if (!being_called_from_hotplug_notifier_callback) >>>>> get_online_cpus() >>>>> >>>>> How does that protect from the scenario I've outlined before though? >>>>> >>>>> CPU 0 CPU 1 >>>>> kmem_cache_destroy() >>>>> mutex_lock(slab_mutex) >>>>> _cpu_up() >>>>> cpu_hotplug_begin() >>>>> mutex_lock(cpu_hotplug.lock) >>>>> rcu_barrier() >>>>> _rcu_barrier() >>>>> get_online_cpus() >>>>> mutex_lock(cpu_hotplug.lock) >>>>> (blocks, CPU 1 has the mutex) >>>>> __cpu_notify() >>>>> mutex_lock(slab_mutex) >>>>> >>>>> CPU 0 grabs both locks anyway (it's not running from notifier callback). >>>>> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called >>>>> from notifier callback either. >>>>> >>>>> What did I miss? >>>> >>>> You didn't miss anything, I was suffering a failure to read carefully. >>>> >>>> So my next stupid question is "Why can't kmem_cache_destroy drop >>>> slab_mutex early?" like the following: >>>> >>>> void kmem_cache_destroy(struct kmem_cache *cachep) >>>> { >>>> BUG_ON(!cachep || in_interrupt()); >>>> >>>> /* Find the cache in the chain of caches. */ >>>> get_online_cpus(); >>>> mutex_lock(&slab_mutex); >>>> /* >>>> * the chain is never empty, cache_cache is never destroyed >>>> */ >>>> list_del(&cachep->list); >>>> if (__cache_shrink(cachep)) { >>>> slab_error(cachep, "Can't free all objects"); >>>> list_add(&cachep->list, &slab_caches); >>>> mutex_unlock(&slab_mutex); >>>> put_online_cpus(); >>>> return; >>>> } >>>> mutex_unlock(&slab_mutex); >>>> >>>> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) >>>> rcu_barrier(); >>>> >>>> __kmem_cache_destroy(cachep); >>>> put_online_cpus(); >>>> } >>>> >>>> Or did I miss some reason why __kmem_cache_destroy() needs that lock? >>>> Looks to me like it is just freeing now-disconnected memory. >>> >>> Good question. I believe it should be safe to drop slab_mutex earlier, as >>> cachep has already been unlinked. I am adding slab people and linux-mm to >>> CC (the whole thread on LKML can be found at >>> https://lkml.org/lkml/2012/10/2/296 for reference). >>> >>> How about the patch below? Pekka, Christoph, please? >>> >>> It makes the lockdep happy again, and obviously removes the deadlock (I >>> tested it). >>> >>> >>> >>> From: Jiri Kosina <jkosina@xxxxxxx> >>> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy() >>> >>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on >>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock >>> dependency through kmem_cache_destroy() -> rcu_barrier() -> >>> _rcu_barrier() -> get_online_cpus(). >>> >>> This opens a possibilty for deadlock: >>> >>> CPU 0 CPU 1 >>> kmem_cache_destroy() >>> mutex_lock(slab_mutex) >>> _cpu_up() >>> cpu_hotplug_begin() >>> mutex_lock(cpu_hotplug.lock) >>> rcu_barrier() >>> _rcu_barrier() >>> get_online_cpus() >>> mutex_lock(cpu_hotplug.lock) >>> (blocks, CPU 1 has the mutex) >>> __cpu_notify() >>> mutex_lock(slab_mutex) >> >> Hmm.. no, this should *never* happen IMHO! >> >> If I am seeing the code right, kmem_cache_destroy() wraps its entire content >> inside get/put_online_cpus(), which means it cannot run concurrently with cpu_up() >> or cpu_down(). Are we really hitting a corner case where the refcounting logic >> in get/put_online_cpus() is failing and allowing a hotplug writer to run in >> parallel with a hotplug reader? If yes, *that* is the problem we have to fix.. >> > > One scenario where we can see this happen is if we had a put_online_cpus() "leak" > somewhere.. that is, perhaps the task that was about to invoke kmem_cache_destroy() > previously called put_online_cpus() once too many. In that case, the get_online_cpus() > in kmem_cache_destroy() might prove useless in excluding it from concurrent hotplug > operations. > > Jiri, can you please try the debug patch below? It would be good to see if the > refcount ever went negative... > Better to catch the bug even earlier, at the right moment, in put_online_cpus() itself. Could you try this one instead? Regards, Srivatsa S. Bhat ---------------------------------------------------- kernel/cpu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index f560598..00d29bc 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -80,6 +80,7 @@ void put_online_cpus(void) if (cpu_hotplug.active_writer == current) return; mutex_lock(&cpu_hotplug.lock); + BUG_ON(cpu_hotplug.refcount == 0); if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer)) wake_up_process(cpu_hotplug.active_writer); mutex_unlock(&cpu_hotplug.lock); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>