On 9/6/23 21:18, Lorenzo Stoakes wrote: > On Tue, 5 Sept 2023 at 12:48, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: >> >> On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote: >> > On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote: >> > > From: Zqiang <qiang.zhang1211@xxxxxxxxx> >> > > >> > > Currently, for double invoke call_rcu(), will dump rcu_head objects >> > > memory info, if the objects is not allocated from the slab allocator, >> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock >> > > need to be held, since the call_rcu() can be invoked in interrupt context, >> > > therefore, there is a possibility of spinlock deadlock scenarios. >> > > >> > > And in Preempt-RT kernel, the rcutorture test also trigger the following >> > > lockdep warning: >> > > >> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 >> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 >> > > preempt_count: 1, expected: 0 >> > > RCU nest depth: 1, expected: 1 >> > > 3 locks held by swapper/0/1: >> > > #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 >> > > #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 >> > > #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 >> > > irq event stamp: 565512 >> > > hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 >> > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 >> > > softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 >> > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 >> > > Preemption disabled at: >> > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 >> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 >> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 >> > > Call Trace: >> > > <TASK> >> > > dump_stack_lvl+0x68/0xb0 >> > > dump_stack+0x14/0x20 >> > > __might_resched+0x1aa/0x280 >> > > ? __pfx_rcu_torture_err_cb+0x10/0x10 >> > > rt_spin_lock+0x53/0x130 >> > > ? find_vmap_area+0x1f/0x70 >> > > find_vmap_area+0x1f/0x70 >> > > vmalloc_dump_obj+0x20/0x60 >> > > mem_dump_obj+0x22/0x90 >> > > __call_rcu_common+0x5bf/0x940 >> > > ? debug_smp_processor_id+0x1b/0x30 >> > > call_rcu_hurry+0x14/0x20 >> > > rcu_torture_init+0x1f82/0x2370 >> > > ? __pfx_rcu_torture_leak_cb+0x10/0x10 >> > > ? __pfx_rcu_torture_leak_cb+0x10/0x10 >> > > ? __pfx_rcu_torture_init+0x10/0x10 >> > > do_one_initcall+0x6c/0x300 >> > > ? debug_smp_processor_id+0x1b/0x30 >> > > kernel_init_freeable+0x2b9/0x540 >> > > ? __pfx_kernel_init+0x10/0x10 >> > > kernel_init+0x1f/0x150 >> > > ret_from_fork+0x40/0x50 >> > > ? __pfx_kernel_init+0x10/0x10 >> > > ret_from_fork_asm+0x1b/0x30 >> > > </TASK> >> > > >> > > The previous patch fixes this by using the deadlock-safe best-effort >> > > version of find_vm_area. However, in case of failure print the fact that >> > > the pointer was a vmalloc pointer so that we print at least something. >> > > >> > > Reported-by: Zhen Lei <thunder.leizhen@xxxxxxxxxxxxxxx> >> > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx> >> > > Cc: rcu@xxxxxxxxxxxxxxx >> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> >> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") >> > > Cc: stable@xxxxxxxxxxxxxxx >> > > Signed-off-by: Zqiang <qiang.zhang1211@xxxxxxxxx> >> > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> >> > > --- >> > > mm/util.c | 4 +++- >> > > 1 file changed, 3 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/mm/util.c b/mm/util.c >> > > index dd12b9531ac4..406634f26918 100644 >> > > --- a/mm/util.c >> > > +++ b/mm/util.c >> > > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object) >> > > if (vmalloc_dump_obj(object)) >> > > return; >> > > >> > > - if (virt_addr_valid(object)) >> > > + if (is_vmalloc_addr(object)) >> > > + type = "vmalloc memory"; >> > > + else if (virt_addr_valid(object)) >> > > type = "non-slab/vmalloc memory"; >> > >> > I think you should update this to say non-slab/non-vmalloc memory (as much >> > as that description sucks!) as this phrasing in the past meant to say >> > 'non-slab or vmalloc memory' (already confusing phrasing) so better to be >> > clear. >> >> True, though the issue you mentioned it is in existing code, a respin of this >> patch could update it to say non-vmalloc. Good point, thanks for reviewing! > > No it's not, you're changing the meaning, because you changed the code > that determines the output... I think it has always meant (but clearly it's not unambiguously worded) "not slab && not vmalloc", that is before and after this patch. Only in case patch 1 is applied and patch 2 not, can the output be wrong in that a vmalloc pointer will (in case of trylock fail) be classified as "not slab && not vmalloc", but seems fine to me after patch 2. I guess if we wanted, we could also rewrite it to be more like the kmem check in the beginning of mem_dump_obj(), so there would be: if (is_vmalloc_addr(...)) { vmalloc_dump_obj(...); return; } where vmalloc_dump_obj() itself would print at least "vmalloc memory" with no further details in case of trylock failure. that assumes is_vmalloc_addr() is guaranteed to be true for all addresses that __find_vmap_area resolves, otherwise it could miss something compared to current code. Is it guaranteed? > This has been merged now despite my outstanding comments (!) so I > guess I'll have to send a follow up patch to address this. > >> >> - Joel >> > > > > -- > Lorenzo Stoakes > https://ljs.io