Re: [PATCH] memory tier: fix deadlock warning while onlining pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Yanfei,

Yanfei Xu <yanfei.xu@xxxxxxxxx> writes:

> The memory_tier_init() acquires unnecessary memory_tier_lock and
> its locking scope includes the memory notifier registration of
> hotplug callback, which also acqures the memory_tier_lock. This can
> trigger an locking dependency detected of ABBA deadlock.
>
> Specifically, If the memory online event occurs, the executing
> memory notifier will access the read lock of the memory_chain.rwsem,
> then the reigistration of the memory notifier in memory_tier_init()
> acquires the write lock of the memory_chain.rwsem while holding
> memory_tier_lock. Then the memory online event continues to invoke
> the memory hotplug callback registered by memory_tier_init(). Since
> this callback tries to acquire the memory_tier_lock, a deadlock
> occurs.

Thank you very much to fix this!

> In fact, this deadlock can't happen because the memory_tier_init()
> always executes before memory online events happen due to the
> subsys_initcall() has an higher priority than module_init().
>
> [  133.491106] WARNING: possible circular locking dependency detected
> [  133.493656] 6.11.0-rc2+ #146 Tainted: G           O     N
> [  133.504290] ------------------------------------------------------
> [  133.515194] (udev-worker)/1133 is trying to acquire lock:
> [  133.525715] ffffffff87044e28 (memory_tier_lock){+.+.}-{3:3}, at: memtier_hotplug_callback+0x383/0x4b0
> [  133.536449]
> [  133.536449] but task is already holding lock:
> [  133.549847] ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
> [  133.556781]
> [  133.556781] which lock already depends on the new lock.
> [  133.556781]
> [  133.569957]
> [  133.569957] the existing dependency chain (in reverse order) is:
> [  133.577618]
> [  133.577618] -> #1 ((memory_chain).rwsem){++++}-{3:3}:
> [  133.584997]        down_write+0x97/0x210
> [  133.588647]        blocking_notifier_chain_register+0x71/0xd0
> [  133.592537]        register_memory_notifier+0x26/0x30
> [  133.596314]        memory_tier_init+0x187/0x300
> [  133.599864]        do_one_initcall+0x117/0x5d0
> [  133.603399]        kernel_init_freeable+0xab0/0xeb0
> [  133.606986]        kernel_init+0x28/0x2f0
> [  133.610312]        ret_from_fork+0x59/0x90
> [  133.613652]        ret_from_fork_asm+0x1a/0x30
> [  133.617012]
> [  133.617012] -> #0 (memory_tier_lock){+.+.}-{3:3}:
> [  133.623390]        __lock_acquire+0x2efd/0x5c60
> [  133.626730]        lock_acquire+0x1ce/0x580
> [  133.629757]        __mutex_lock+0x15c/0x1490
> [  133.632731]        mutex_lock_nested+0x1f/0x30
> [  133.635717]        memtier_hotplug_callback+0x383/0x4b0
> [  133.638748]        notifier_call_chain+0xbf/0x370
> [  133.641647]        blocking_notifier_call_chain+0x76/0xb0
> [  133.644636]        memory_notify+0x2e/0x40
> [  133.647427]        online_pages+0x597/0x720
> [  133.650246]        memory_subsys_online+0x4f6/0x7f0
> [  133.653107]        device_online+0x141/0x1d0
> [  133.655831]        online_memory_block+0x4d/0x60
> [  133.658616]        walk_memory_blocks+0xc0/0x120
> [  133.661419]        add_memory_resource+0x51d/0x6c0
> [  133.664202]        add_memory_driver_managed+0xf5/0x180
> [  133.667060]        dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
> [  133.669949]        dax_bus_probe+0x147/0x230
> [  133.672687]        really_probe+0x27f/0xac0
> [  133.675463]        __driver_probe_device+0x1f3/0x460
> [  133.678493]        driver_probe_device+0x56/0x1b0
> [  133.681366]        __driver_attach+0x277/0x570
> [  133.684149]        bus_for_each_dev+0x145/0x1e0
> [  133.686937]        driver_attach+0x49/0x60
> [  133.689673]        bus_add_driver+0x2f3/0x6b0
> [  133.692421]        driver_register+0x170/0x4b0
> [  133.695118]        __dax_driver_register+0x141/0x1b0
> [  133.697910]        dax_kmem_init+0x54/0xff0 [kmem]
> [  133.700794]        do_one_initcall+0x117/0x5d0
> [  133.703455]        do_init_module+0x277/0x750
> [  133.706054]        load_module+0x5d1d/0x74f0
> [  133.708602]        init_module_from_file+0x12c/0x1a0
> [  133.711234]        idempotent_init_module+0x3f1/0x690
> [  133.713937]        __x64_sys_finit_module+0x10e/0x1a0
> [  133.716492]        x64_sys_call+0x184d/0x20d0
> [  133.719053]        do_syscall_64+0x6d/0x140
> [  133.721537]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  133.724239]
> [  133.724239] other info that might help us debug this:
> [  133.724239]
> [  133.730832]  Possible unsafe locking scenario:
> [  133.730832]
> [  133.735298]        CPU0                    CPU1
> [  133.737759]        ----                    ----
> [  133.740165]   rlock((memory_chain).rwsem);
> [  133.742623]                                lock(memory_tier_lock);
> [  133.745357]                                lock((memory_chain).rwsem);
> [  133.748141]   lock(memory_tier_lock);
> [  133.750489]
> [  133.750489]  *** DEADLOCK ***
> [  133.750489]
> [  133.756742] 6 locks held by (udev-worker)/1133:
> [  133.759179]  #0: ffff888207be6158 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x26c/0x570
> [  133.762299]  #1: ffffffff875b5868 (device_hotplug_lock){+.+.}-{3:3}, at: lock_device_hotplug+0x20/0x30
> [  133.765565]  #2: ffff88820cf6a108 (&dev->mutex){....}-{3:3}, at: device_online+0x2f/0x1d0
> [  133.768978]  #3: ffffffff86d08ff0 (cpu_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x17/0x30
> [  133.772312]  #4: ffffffff8702dfb0 (mem_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x23/0x30
> [  133.775544]  #5: ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
> [  133.779113]
> [  133.779113] stack backtrace:
> [  133.783728] CPU: 5 UID: 0 PID: 1133 Comm: (udev-worker) Tainted: G           O     N 6.11.0-rc2+ #146
> [  133.787220] Tainted: [O]=OOT_MODULE, [N]=TEST
> [  133.789948] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [  133.793291] Call Trace:
> [  133.795826]  <TASK>
> [  133.798284]  dump_stack_lvl+0xea/0x150
> [  133.801025]  dump_stack+0x19/0x20
> [  133.803609]  print_circular_bug+0x477/0x740
> [  133.806341]  check_noncircular+0x2f4/0x3e0
> [  133.809056]  ? __pfx_check_noncircular+0x10/0x10
> [  133.811866]  ? __pfx_lockdep_lock+0x10/0x10
> [  133.814670]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [  133.817610]  __lock_acquire+0x2efd/0x5c60
> [  133.820339]  ? __pfx___lock_acquire+0x10/0x10
> [  133.823128]  ? __dax_driver_register+0x141/0x1b0
> [  133.825926]  ? do_one_initcall+0x117/0x5d0
> [  133.828648]  lock_acquire+0x1ce/0x580
> [  133.831349]  ? memtier_hotplug_callback+0x383/0x4b0
> [  133.834293]  ? __pfx_lock_acquire+0x10/0x10
> [  133.837134]  __mutex_lock+0x15c/0x1490
> [  133.839829]  ? memtier_hotplug_callback+0x383/0x4b0
> [  133.842753]  ? memtier_hotplug_callback+0x383/0x4b0
> [  133.845602]  ? __this_cpu_preempt_check+0x21/0x30
> [  133.848438]  ? __pfx___mutex_lock+0x10/0x10
> [  133.851200]  ? __pfx_lock_acquire+0x10/0x10
> [  133.853935]  ? global_dirty_limits+0xc0/0x160
> [  133.856699]  ? __sanitizer_cov_trace_switch+0x58/0xa0
> [  133.859564]  mutex_lock_nested+0x1f/0x30
> [  133.862251]  ? mutex_lock_nested+0x1f/0x30
> [  133.864964]  memtier_hotplug_callback+0x383/0x4b0
> [  133.867752]  notifier_call_chain+0xbf/0x370
> [  133.870550]  ? writeback_set_ratelimit+0xe8/0x160
> [  133.873372]  blocking_notifier_call_chain+0x76/0xb0
> [  133.876311]  memory_notify+0x2e/0x40
> [  133.879013]  online_pages+0x597/0x720
> [  133.881686]  ? irqentry_exit+0x3e/0xa0
> [  133.884397]  ? __pfx_online_pages+0x10/0x10
> [  133.887244]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [  133.890299]  ? mhp_init_memmap_on_memory+0x7a/0x1c0
> [  133.893203]  memory_subsys_online+0x4f6/0x7f0
> [  133.896099]  ? __pfx_memory_subsys_online+0x10/0x10
> [  133.899039]  ? xa_load+0x16d/0x2e0
> [  133.901667]  ? __pfx_xa_load+0x10/0x10
> [  133.904366]  ? __pfx_memory_subsys_online+0x10/0x10
> [  133.907218]  device_online+0x141/0x1d0
> [  133.909845]  online_memory_block+0x4d/0x60
> [  133.912494]  walk_memory_blocks+0xc0/0x120
> [  133.915104]  ? __pfx_online_memory_block+0x10/0x10
> [  133.917776]  add_memory_resource+0x51d/0x6c0
> [  133.920404]  ? __pfx_add_memory_resource+0x10/0x10
> [  133.923104]  ? _raw_write_unlock+0x31/0x60
> [  133.925781]  ? register_memory_resource+0x119/0x180
> [  133.928450]  add_memory_driver_managed+0xf5/0x180
> [  133.931036]  dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
> [  133.933665]  ? __pfx_dev_dax_kmem_probe+0x10/0x10 [kmem]
> [  133.936332]  ? __pfx___up_read+0x10/0x10
> [  133.938878]  dax_bus_probe+0x147/0x230
> [  133.941332]  ? __pfx_dax_bus_probe+0x10/0x10
> [  133.943954]  really_probe+0x27f/0xac0
> [  133.946387]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
> [  133.949106]  __driver_probe_device+0x1f3/0x460
> [  133.951704]  ? parse_option_str+0x149/0x190
> [  133.954241]  driver_probe_device+0x56/0x1b0
> [  133.956749]  __driver_attach+0x277/0x570
> [  133.959228]  ? __pfx___driver_attach+0x10/0x10
> [  133.961776]  bus_for_each_dev+0x145/0x1e0
> [  133.964367]  ? __pfx_bus_for_each_dev+0x10/0x10
> [  133.967019]  ? __kasan_check_read+0x15/0x20
> [  133.969543]  ? _raw_spin_unlock+0x31/0x60
> [  133.972132]  driver_attach+0x49/0x60
> [  133.974536]  bus_add_driver+0x2f3/0x6b0
> [  133.977044]  driver_register+0x170/0x4b0
> [  133.979480]  __dax_driver_register+0x141/0x1b0
> [  133.982126]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
> [  133.984724]  dax_kmem_init+0x54/0xff0 [kmem]
> [  133.987284]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
> [  133.989965]  do_one_initcall+0x117/0x5d0
> [  133.992506]  ? __pfx_do_one_initcall+0x10/0x10
> [  133.995185]  ? __kasan_kmalloc+0x88/0xa0
> [  133.997748]  ? kasan_poison+0x3e/0x60
> [  134.000288]  ? kasan_unpoison+0x2c/0x60
> [  134.002762]  ? kasan_poison+0x3e/0x60
> [  134.005202]  ? __asan_register_globals+0x62/0x80
> [  134.007753]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
> [  134.010439]  do_init_module+0x277/0x750
> [  134.012953]  load_module+0x5d1d/0x74f0
> [  134.015406]  ? __pfx_load_module+0x10/0x10
> [  134.017887]  ? __pfx_ima_post_read_file+0x10/0x10
> [  134.020470]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [  134.023127]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [  134.025767]  ? security_kernel_post_read_file+0xa2/0xd0
> [  134.028429]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [  134.031162]  ? kernel_read_file+0x503/0x820
> [  134.033645]  ? __pfx_kernel_read_file+0x10/0x10
> [  134.036232]  ? __pfx___lock_acquire+0x10/0x10
> [  134.038766]  init_module_from_file+0x12c/0x1a0
> [  134.041291]  ? init_module_from_file+0x12c/0x1a0
> [  134.043936]  ? __pfx_init_module_from_file+0x10/0x10
> [  134.046516]  ? __this_cpu_preempt_check+0x21/0x30
> [  134.049091]  ? __kasan_check_read+0x15/0x20
> [  134.051551]  ? do_raw_spin_unlock+0x60/0x210
> [  134.054077]  idempotent_init_module+0x3f1/0x690
> [  134.056643]  ? __pfx_idempotent_init_module+0x10/0x10
> [  134.059318]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [  134.061995]  ? __fget_light+0x17d/0x210
> [  134.064428]  __x64_sys_finit_module+0x10e/0x1a0
> [  134.066976]  x64_sys_call+0x184d/0x20d0
> [  134.069405]  do_syscall_64+0x6d/0x140
> [  134.071926]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Fixes: 823430c8e9d9 ("memory tier: consolidate the initialization of memory tiers")
> Signed-off-by: Yanfei Xu <yanfei.xu@xxxxxxxxx>
> ---
>  mm/memory-tiers.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 4775b3a3dabe..dddcd6b38e28 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -895,7 +895,6 @@ static int __init memory_tier_init(void)
>  	WARN_ON(!node_demotion);
>  #endif
>  
> -	guard(mutex)(&memory_tier_lock);
>  	/*
>  	 * For now we can have 4 faster memory tiers with smaller adistance
>  	 * than default DRAM tier.

Although it's not absolutely necessary, I still think that it's better
to just revert the locking change of memory_tier_init() in commit
823430c8e9d9 ("memory tier: consolidate the initialization of memory
tiers").  That is, to use mutex_lock/unlock() and exclude
hotplug_memory_notifier() from the locked region.  Because we will
always hold memory_tier_lock when working on memory tier related data
structures in this way.

--
Best Regards,
Huang, Ying




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux