Re: Deadlock between CPU offline and kmem_cache_create

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 3 May 2021 12:58:19 +0200

+CC's

On 5/2/21 7:18 PM, Paul E. McKenney wrote:
> Hello!

Hi!

> Commit 1f0723a4c0df ("mm, slub: enable slub_debug static key when creating
> cache with explicit debug flags") results in the lockdep complaint (see
> below) in the presence of CPU-hotplug offline operations.  I triggered
> and bisected this using the following command:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10 --configs TREE05 --trust-make
> 
> This triggers when rcutorture's kmem_cache_create() for its testing of
> mem_dump_obj().  When I reverted commit 1f0723a4c0df, the splat went away.
> 
> I tried moving rcutorture's mem_dump_obj() testing to rcutorture's
> module-init function, but that did not help.  In retrospect, this is no
> surprise because the deadlock is caused by the call to kmem_cache_create()
> and the slab CPU-hotplug notifiers.  There is no lock in this deadlock
> cycle that is under rcutorture's control.

So if I understand correctly it's because there's one sequence of
locks from the hotplug callbacks:

lock(cpu_hotplug_lock); // from hotplug machinery itself
lock(slab_mutex); // in e.g. slab_mem_going_offline_callback()

And my commit made the reverse sequence possible:
lock(slab_mutex); // in kmem_cache_create_usercopy()
lock(cpu_hotplug_lock); // kmem_cache_open() -> static_key_enable()

> I could imagine moving the static_branch_enable() out into a clean
> environment, but this would of course require some mechanism to make
> sure that the slab was still in existence at that time.  One way to do
> this would be to have a timer that is set at the site of the current
> static_branch_enable() and deleted at slab-deletion time.  Plus there
> would be a short period of time when debugging would not yet be enabled
> for this slab (would that be a problem?).

Perhaps the cleanest solution would simply be to move the whole hunk of commit
1f0723a4c0df to kmem_cache_create_usercopy() before the slab_mutex is taken?
See the end of my e-mail. Would that fix your testcase?
 
> This time could be minimized using (say) an hrtimer timeout of 1
> microsecond or some such.  It could be eliminated by having the timer
> handler do a wakeup that kmem_cache_create() waits for at some point
> after it releases slab_mutex.
> 
> Alternatively, maybe some way can be found to avoid acquiring slab_mutex
> in slab's CPU-hotplug notifiers.

Hm that would be much harder to get right and verify all is OK, I'm afraid.
Actually the cpu hotplug locking was also changed recently by me in 59450bbc12be
("mm, slab, slub: stop taking cpu hotplug lock") which mentions how some
operations are protected by slab_mutex. I was happy to get rid of the cpu
hotplug lock, but ironically if that didn't happen, then 1f0723a4c0df would have
been now fine (using static_key_enable_cpuslocked()) as it would be using the
same lock order as hotplug :)

Thanks, Vlastimil

> Thoughts?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> [  602.429399] ======================================================
> [  602.429777] WARNING: possible circular locking dependency detected
> [  602.430156] 5.12.0+ #15 Not tainted
> [  602.430374] ------------------------------------------------------
> [  602.430759] rcu_torture_sta/109 is trying to acquire lock:
> [  602.431099] ffffffff96063cd0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x9/0x20
> [  602.431630] 
> [  602.431630] but task is already holding lock:
> [  602.431992] ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
> [  602.432541] 
> [  602.432541] which lock already depends on the new lock.
> [  602.432541] 
> [  602.433039] 
> [  602.433039] the existing dependency chain (in reverse order) is:
> [  602.433494] 
> [  602.433494] -> #1 (slab_mutex){+.+.}-{3:3}:
> [  602.433842]        lock_acquire+0xb9/0x3a0
> [  602.434107]        __mutex_lock+0x8d/0x920
> [  602.434366]        slub_cpu_dead+0x15/0xf0
> [  602.434625]        cpuhp_invoke_callback+0x17a/0x7c0
> [  602.434938]        cpuhp_invoke_callback_range+0x3b/0x80
> [  602.435266]        _cpu_down+0xdf/0x2a0
> [  602.435504]        cpu_down+0x2c/0x50
> [  602.435734]        device_offline+0x82/0xb0
> [  602.436005]        remove_cpu+0x1a/0x30
> [  602.436243]        torture_offline+0x80/0x140
> [  602.436514]        torture_onoff+0x147/0x260
> [  602.436778]        kthread+0x10a/0x140
> [  602.437013]        ret_from_fork+0x22/0x30
> [  602.437274] 
> [  602.437274] -> #0 (cpu_hotplug_lock){++++}-{0:0}:
> [  602.437654]        check_prev_add+0x8f/0xbf0
> [  602.437919]        __lock_acquire+0x13f0/0x1d80
> [  602.438198]        lock_acquire+0xb9/0x3a0
> [  602.438452]        cpus_read_lock+0x21/0xa0
> [  602.438713]        static_key_enable+0x9/0x20
> [  602.438985]        __kmem_cache_create+0x38d/0x430
> [  602.439284]        kmem_cache_create_usercopy+0x146/0x250
> [  602.439619]        kmem_cache_create+0xd/0x10
> [  602.439895]        rcu_torture_stats+0x79/0x280
> [  602.440179]        kthread+0x10a/0x140
> [  602.440413]        ret_from_fork+0x22/0x30
> [  602.440669] 
> [  602.440669] other info that might help us debug this:
> [  602.440669] 
> [  602.441154]  Possible unsafe locking scenario:
> [  602.441154] 
> [  602.441523]        CPU0                    CPU1
> [  602.441803]        ----                    ----
> [  602.442085]   lock(slab_mutex);
> [  602.442281]                                lock(cpu_hotplug_lock);
> [  602.442662]                                lock(slab_mutex);
> [  602.443009]   lock(cpu_hotplug_lock);
> [  602.443239] 
> [  602.443239]  *** DEADLOCK ***
> [  602.443239] 
> [  602.443606] 1 lock held by rcu_torture_sta/109:
> [  602.443892]  #0: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
> [  602.444472] 
> [  602.444472] stack backtrace:
> [  602.444743] CPU: 3 PID: 109 Comm: rcu_torture_sta Not tainted 5.12.0+ #15
> [  602.445176] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [  602.445728] Call Trace:
> [  602.445891]  dump_stack+0x6d/0x89
> [  602.446116]  check_noncircular+0xfe/0x110
> [  602.446401]  ? lock_is_held_type+0x98/0x110
> [  602.446664]  check_prev_add+0x8f/0xbf0
> [  602.446908]  __lock_acquire+0x13f0/0x1d80
> [  602.447162]  lock_acquire+0xb9/0x3a0
> [  602.447385]  ? static_key_enable+0x9/0x20
> [  602.447640]  ? mark_held_locks+0x49/0x70
> [  602.447894]  cpus_read_lock+0x21/0xa0
> [  602.448124]  ? static_key_enable+0x9/0x20
> [  602.448373]  static_key_enable+0x9/0x20
> [  602.448614]  __kmem_cache_create+0x38d/0x430
> [  602.448882]  kmem_cache_create_usercopy+0x146/0x250
> [  602.449184]  ? rcu_torture_stats_print+0xd0/0xd0
> [  602.449469]  kmem_cache_create+0xd/0x10
> [  602.449708]  rcu_torture_stats+0x79/0x280
> [  602.449964]  ? rcu_torture_stats_print+0xd0/0xd0
> [  602.450251]  kthread+0x10a/0x140
> [  602.450452]  ? kthread_park+0x80/0x80
> [  602.450682]  ret_from_fork+0x22/0x30
> 

----8<----

diff --git a/mm/slab_common.c b/mm/slab_common.c
index f8833d3e5d47..a4a571428c51 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -318,6 +318,16 @@ kmem_cache_create_usercopy(const char *name,
 	const char *cache_name;
 	int err;
 
+#ifdef CONFIG_SLUB_DEBUG
+	/*
+	 * If no slub_debug was enabled globally, the static key is not yet
+	 * enabled by setup_slub_debug(). Enable it if the cache is being
+	 * created with any of the debugging flags passed explicitly.
+	 */
+	if (flags & SLAB_DEBUG_FLAGS)
+		static_branch_enable(&slub_debug_enabled);
+#endif
+
 	mutex_lock(&slab_mutex);
 
 	err = kmem_cache_sanity_check(name, size);
diff --git a/mm/slub.c b/mm/slub.c
index 68123b21e65f..ec87ae218d45 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3828,15 +3828,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 
 static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 {
-#ifdef CONFIG_SLUB_DEBUG
-	/*
-	 * If no slub_debug was enabled globally, the static key is not yet
-	 * enabled by setup_slub_debug(). Enable it if the cache is being
-	 * created with any of the debugging flags passed explicitly.
-	 */
-	if (flags & SLAB_DEBUG_FLAGS)
-		static_branch_enable(&slub_debug_enabled);
-#endif
 	s->flags = kmem_cache_flags(s->size, flags, s->name);
 #ifdef CONFIG_SLAB_FREELIST_HARDENED
 	s->random = get_random_long();