On 12/1/23 14:08, Catalin Marinas wrote:
On systems with large number of CPUs, the following soft lockup splat might sometimes happen: [ 2656.001617] watchdog: BUG: soft lockup - CPU#364 stuck for 21s! [ksoftirqd/364:2206] : [ 2656.141194] RIP: 0010:_raw_spin_unlock_irqrestore+0x3d/0x70 : 2656.241214] Call Trace: [ 2656.243971] <IRQ> [ 2656.246237] ? show_trace_log_lvl+0x1c4/0x2df [ 2656.251152] ? show_trace_log_lvl+0x1c4/0x2df [ 2656.256066] ? kmemleak_free_percpu+0x11f/0x1f0 [ 2656.261173] ? watchdog_timer_fn+0x379/0x470 [ 2656.265984] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 2656.271179] ? __hrtimer_run_queues+0x5f3/0xd00 [ 2656.276283] ? __pfx___hrtimer_run_queues+0x10/0x10 [ 2656.281783] ? ktime_get_update_offsets_now+0x95/0x2c0 [ 2656.287573] ? ktime_get_update_offsets_now+0xdd/0x2c0 [ 2656.293380] ? hrtimer_interrupt+0x2e9/0x780 [ 2656.298221] ? __sysvec_apic_timer_interrupt+0x184/0x640 [ 2656.304211] ? sysvec_apic_timer_interrupt+0x8e/0xc0 [ 2656.309807] </IRQ> [ 2656.312169] <TASK> [ 2656.326110] kmemleak_free_percpu+0x11f/0x1f0 [ 2656.331015] free_percpu.part.0+0x1b/0xe70 [ 2656.335635] free_vfsmnt+0xb9/0x100 [ 2656.339567] rcu_do_batch+0x3c8/0xe30 [ 2656.363693] rcu_core+0x3de/0x5a0 [ 2656.367433] __do_softirq+0x2d0/0x9a8 [ 2656.381119] run_ksoftirqd+0x36/0x60 [ 2656.385145] smpboot_thread_fn+0x556/0x910 [ 2656.394971] kthread+0x2a4/0x350 [ 2656.402826] ret_from_fork+0x29/0x50 [ 2656.406861] </TASK> The issue is caused by kmemleak registering each per_cpu_ptr() corresponding to the __percpu pointer. This is unnecessary since such individual per-CPU pointers are not tracked anyway. Create a new object_percpu_tree_root rbtree that stores a single __percpu pointer together with an OBJECT_PERCPU flag for the kmemleak metadata. Scanning needs to be done for all per_cpu_ptr() pointers with a cond_resched() between each CPU iteration to avoid RCU stalls. Signed-off-by: Catalin Marinas <catalin.marinas@xxxxxxx> Reported-by: Waiman Long <longman@xxxxxxxxxx> Link: https://lore.kernel.org/r/20231127194153.289626-1-longman@xxxxxxxxxx Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- The only difference from the inlined patch I posted previously is some updated comments to include the new object tree. mm/kmemleak.c | 178 +++++++++++++++++++++++++++----------------------- 1 file changed, 97 insertions(+), 81 deletions(-) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 1eacca03bedd..eb6cdc3e9af2 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -14,17 +14,15 @@ * The following locks and mutexes are used by kmemleak: * * - kmemleak_lock (raw_spinlock_t): protects the object_list as well as - * del_state modifications and accesses to the object_tree_root (or - * object_phys_tree_root). The object_list is the main list holding the - * metadata (struct kmemleak_object) for the allocated memory blocks. - * The object_tree_root and object_phys_tree_root are red - * black trees used to look-up metadata based on a pointer to the - * corresponding memory block. The object_phys_tree_root is for objects - * allocated with physical address. The kmemleak_object structures are - * added to the object_list and object_tree_root (or object_phys_tree_root) - * in the create_object() function called from the kmemleak_alloc() (or - * kmemleak_alloc_phys()) callback and removed in delete_object() called from - * the kmemleak_free() callback + * del_state modifications and accesses to the object trees + * (object_tree_root, object_phys_tree_root, object_percpu_tree_root). The + * object_list is the main list holding the metadata (struct + * kmemleak_object) for the allocated memory blocks. The object trees are + * red black trees used to look-up metadata based on a pointer to the + * corresponding memory block. The kmemleak_object structures are added to + * the object_list and the object tree root in the create_object() function + * called from the kmemleak_alloc() (or kmemleak_alloc_phys()) callback and + * removed in delete_object() called from the kmemleak_free() callback
Just a minor nit. For completeness, should we mention kmemleak_alloc_percpu() and kmemleak_free_percpu() here?
Anyway, I won't mind if you want to keep it as it is. Reviewed-by: Waiman Long <longman@xxxxxxxxxx>