> On Dec 9, 2019, at 10:21 PM, Qian Cai <cai@xxxxxx> wrote: > > > >> On Dec 9, 2019, at 3:33 PM, Christopher Lameter <cl@xxxxxxxxx> wrote: >> >> On Mon, 9 Dec 2019, Daniel Wagner wrote: >> >>> [ 5.038862] >>> [ 5.038862] -> #2 (kn->count#39){++++}: >>> [ 5.039329] __kernfs_remove+0x240/0x2e0 >>> [ 5.039717] kernfs_remove_by_name_ns+0x3c/0x80 >>> [ 5.040159] sysfs_slab_add+0x184/0x250 >> >> sysfs_slab_add should not be called under any lock. But it happens during >> an initcall (sysfs_slab_init) when the kmalloc slab array is being set up. >> >> And the problems results from a hotplug event? During system bringup when >> the slab caches have not been setup yet? >> >> Is this really something that can happen? >> >> > > > It happens to me too with a probably easier to trigger deadlock. Basically, we have, > > memcg_create_kmem_cache(): > cpu_hotplug_lock.rw_sem/mem_hotplug_lock.rw_sem —> kn->count Correction — kmemcg_workfn() -> kmemcg_cache_shutdown_fn() does it NOT memcg_create_kmem_cache(). How to fix it is beyond my imagination as the relevant code looks all too old that probably need a redesign. > Then, "slabinfo -d” does the opposite locking order, > > kmem_cache_shrink_all+0x50/0x100 (cpu_hotplug_lock.rw_sem/mem_hotplug_lock.rw_sem) > shrink_store+0x34/0x60 > slab_attr_store+0x6c/0x170 > sysfs_kf_write+0x70/0xb0 > kernfs_fop_write+0x11c/0x270 ((kn->count) > __vfs_write+0x3c/0x70 > vfs_write+0xcc/0x200 > ksys_write+0x7c/0x140 > system_call+0x5c/0x6 > > > [ 1776.927152][ T9264] WARNING: possible circular locking dependency detected > [ 1776.927164][ T9264] 5.5.0-rc1-next-20191209 #2 Not tainted > [ 1776.927172][ T9264] ------------------------------------------------------ > [ 1776.927182][ T9264] slabinfo/9264 is trying to acquire lock: > [ 1776.927204][ T9264] c000000001072c90 (cpu_hotplug_lock.rw_sem){++++}, at: kmem_cache_shrink_all+0x50/0x100 > [ 1776.927224][ T9264] > [ 1776.927224][ T9264] but task is already holding lock: > [ 1776.927246][ T9264] c0002013fae02e90 (kn->count#80){++++}, at: kernfs_fop_write+0xe0/0x270 > [ 1776.927274][ T9264] > [ 1776.927274][ T9264] which lock already depends on the new lock. > [ 1776.927274][ T9264] > [ 1776.927298][ T9264] > [ 1776.927298][ T9264] the existing dependency chain (in reverse order) is: > [ 1776.927332][ T9264] > [ 1776.927332][ T9264] -> #2 (kn->count#80){++++}: > [ 1776.927355][ T9264] __kernfs_remove+0x3b0/0x440 > [ 1776.927374][ T9264] kernfs_remove+0x48/0x70 > [ 1776.927405][ T9264] sysfs_remove_dir+0x78/0xb0 > [ 1776.927437][ T9264] kobject_del+0x48/0xa0 > [ 1776.927458][ T9264] sysfs_slab_unlink+0x38/0x50 > [ 1776.927468][ T9264] shutdown_cache+0x208/0x2b0 > [ 1776.927477][ T9264] kmemcg_cache_shutdown_fn+0x20/0x40 > [ 1776.927509][ T9264] kmemcg_workfn+0x64/0xa0 > [ 1776.927530][ T9264] process_one_work+0x300/0x8e0 > [ 1776.927550][ T9264] worker_thread+0x78/0x530 > [ 1776.927580][ T9264] kthread+0x1a8/0x1b0 > [ 1776.927612][ T9264] ret_from_kernel_thread+0x5c/0x74 > [ 1776.927641][ T9264] > [ 1776.927641][ T9264] -> #1 (slab_mutex){+.+.}: > [ 1776.927663][ T9264] __mutex_lock+0xdc/0xb20 > [ 1776.927672][ T9264] memcg_create_kmem_cache+0x54/0x230 > [ 1776.927704][ T9264] memcg_kmem_cache_create_func+0x3c/0x280 > [ 1776.927725][ T9264] process_one_work+0x300/0x8e0 > [ 1776.927755][ T9264] worker_thread+0x78/0x530 > [ 1776.927774][ T9264] kthread+0x1a8/0x1b0 > [ 1776.927794][ T9264] ret_from_kernel_thread+0x5c/0x74 > [ 1776.927813][ T9264] > [ 1776.927918][ T9264] > [ 1776.927918][ T9264] -> #0 (cpu_hotplug_lock.rw_sem){++++}: > [ 1776.927954][ T9264] __lock_acquire+0x1644/0x2360 > [ 1776.927973][ T9264] lock_acquire+0x130/0x360 > [ 1776.927994][ T9264] cpus_read_lock+0x64/0x170 > [ 1776.928014][ T9264] kmem_cache_shrink_all+0x50/0x100 > [ 1776.928034][ T9264] shrink_store+0x34/0x60 > [ 1776.928065][ T9264] slab_attr_store+0x6c/0x170 > [ 1776.928085][ T9264] sysfs_kf_write+0x70/0xb0 > [ 1776.928115][ T9264] kernfs_fop_write+0x11c/0x270 > [ 1776.928136][ T9264] __vfs_write+0x3c/0x70 > [ 1776.928156][ T9264] vfs_write+0xcc/0x200 > [ 1776.928176][ T9264] ksys_write+0x7c/0x140 > [ 1776.928195][ T9264] system_call+0x5c/0x68 > [ 1776.928214][ T9264] > [ 1776.928214][ T9264] other info that might help us debug this: > [ 1776.928214][ T9264] > [ 1776.928238][ T9264] Chain exists of: > [ 1776.928238][ T9264] cpu_hotplug_lock.rw_sem --> slab_mutex --> kn->count#80 > [ 1776.928238][ T9264] > [ 1776.928289][ T9264] Possible unsafe locking scenario: > [ 1776.928289][ T9264] > [ 1776.928321][ T9264] CPU0 CPU1 > [ 1776.928340][ T9264] ---- ---- > [ 1776.928359][ T9264] lock(kn->count#80); > [ 1776.928377][ T9264] lock(slab_mutex); > [ 1776.928398][ T9264] lock(kn->count#80); > [ 1776.928418][ T9264] lock(cpu_hotplug_lock.rw_sem); > [ 1776.928438][ T9264] > [ 1776.928438][ T9264] *** DEADLOCK *** > [ 1776.928438][ T9264] > [ 1776.928471][ T9264] 3 locks held by slabinfo/9264: > [ 1776.928490][ T9264] #0: c000000042301408 (sb_writers#4){.+.+}, at: vfs_write+0x180/0x200 > [ 1776.928525][ T9264] #1: c00020063c2a5680 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd4/0x270 > [ 1776.928561][ T9264] #2: c0002013fae02e90 (kn->count#80){++++}, at: kernfs_fop_write+0xe0/0x270 > [ 1776.928607][ T9264] > [ 1776.928607][ T9264] stack backtrace: > [ 1776.928640][ T9264] CPU: 65 PID: 9264 Comm: slabinfo Not tainted 5.5.0-rc1-next-20191209 #2 > [ 1776.928684][ T9264] Call Trace: > [ 1776.928705][ T9264] [c00000162edcf790] [c000000000944460] dump_stack+0xf4/0x164 (unreliable) > [ 1776.928740][ T9264] [c00000162edcf7e0] [c0000000001be0e8] print_circular_bug+0x298/0x2b0 > [ 1776.928773][ T9264] [c00000162edcf880] [c0000000001be360] check_noncircular+0x260/0x320 > [ 1776.928807][ T9264] [c00000162edcf980] [c0000000001c38c4] __lock_acquire+0x1644/0x2360 > [ 1776.928840][ T9264] [c00000162edcfae0] [c0000000001c12a0] lock_acquire+0x130/0x360 > [ 1776.928874][ T9264] [c00000162edcfbc0] [c0000000001063b4] cpus_read_lock+0x64/0x170 > [ 1776.928907][ T9264] [c00000162edcfc00] [c000000000384620] kmem_cache_shrink_all+0x50/0x100 > [ 1776.928951][ T9264] [c00000162edcfc40] [c000000000425e14] shrink_store+0x34/0x60 > [ 1776.928973][ T9264] [c00000162edcfc70] [c000000000423f4c] slab_attr_store+0x6c/0x170 > [ 1776.929007][ T9264] [c00000162edcfcd0] [c0000000005b7340] sysfs_kf_write+0x70/0xb0 > [ 1776.929030][ T9264] [c00000162edcfd10] [c0000000005b650c] kernfs_fop_write+0x11c/0x270 > [ 1776.929063][ T9264] [c00000162edcfd60] [c0000000004a313c] __vfs_write+0x3c/0x70 > [ 1776.929127][ T9264] [c00000162edcfd80] [c0000000004a61ec] vfs_write+0xcc/0x200 > [ 1776.929209][ T9264] [c00000162edcfdd0] [c0000000004a658c] ksys_write+0x7c/0x140 > [ 1776.929299][ T9264] [c00000162edcfe20] [c00000000000b378] system_call+0x5c/0x6