Re: rcutorture: Sleeping function called from invalid context

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>
> On Wed, Aug 16, 2023 at 08:25:49AM -0400, Joel Fernandes wrote:
> > > On Aug 16, 2023, at 5:30 AM, Z qiang <qiang.zhang1211@xxxxxxxxx> wrote:
> > >>>> On Tue, Aug 15, 2023 at 1:44 AM Z qiang <qiang.zhang1211@xxxxxxxxx> wrote:
> > >>>>>
> > >>>>> Hello, Paul
> > >>>>>
> > >>>>> when running the rcutorture test with rcutorture.object_debug=1 in
> > >>>>> PREEMPT_RT kernel.
> > >>>>>
> > >>>>> BUG: sleeping function called from invalid context at
> > >>>>> kernel/locking/spinlock_rt.c:48
> > >>>>> [    3.116503] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid:
> > >>>>> 1, name: swapper/0
> > >>>>> [    3.116503] preempt_count: 1, expected: 0
> > >>>>> [    3.116503] RCU nest depth: 1, expected: 1
> > >>>>> [    3.116503] 3 locks held by swapper/0/1:
> > >>>>> [    3.116503]  #0: ffffffff8f862d80 (fullstop_mutex){+.+.}-{4:4}, at:
> > >>>>> torture_init_begin+0x24/0xa0
> > >>>>> [    3.116503]  #1: ffffffff8f7b1400 (rcu_read_lock){....}-{1:3}, at:
> > >>>>> rcu_torture_init+0x20b9/0x2380
> > >>>>> [    3.116503]  #2: ffffffff8f89fa20 (vmap_area_lock){+.+.}-{3:3}, at:
> > >>>>> find_vmap_area+0x21/0x90
> > >>>>> [    3.116503] irq event stamp: 571506
> > >>>>> [    3.116503] hardirqs last  enabled at (571505):
> > >>>>> [<ffffffff8d422ff5>] __call_rcu_common+0x3a5/0xdc0
> > >>>>> [    3.116503] hardirqs last disabled at (571506):
> > >>>>> [<ffffffff900487f7>] rcu_torture_init+0x22b7/0x2380
> > >>>>> [    3.116503] softirqs last  enabled at (406916):
> > >>>>> [<ffffffff8d2e91f2>] __local_bh_enable_ip+0x132/0x170
> > >>>>> [    3.116503] softirqs last disabled at (406910):
> > >>>>> [<ffffffff8e658e79>] inet_register_protosw+0x9/0x190
> > >>>>> [    3.116503] Preemption disabled at:
> > >>>>> [    3.116503] [<ffffffff9004865a>] rcu_torture_init+0x211a/0x2380
> > >>>>> [    3.116503] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G        W
> > >>>>>   6.5.0-rc4-rt2-yocto-preempt-rt+ #7
> > >>>>> [    3.116503] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > >>>>> BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > >>>>> [    3.116503] Call Trace:
> > >>>>> [    3.116503]  <TASK>
> > >>>>> [    3.116503]  dump_stack_lvl+0x68/0xb0
> > >>>>> [    3.116503]  dump_stack+0x14/0x20
> > >>>>> [    3.116503]  __might_resched+0x221/0x2f0
> > >>>>> [    3.116503]  rt_spin_lock+0x60/0x130
> > >>>>> [    3.116503]  ? find_vmap_area+0x21/0x90
> > >>>>> [    3.116503]  find_vmap_area+0x21/0x90
> > >>>>> [    3.116503]  vmalloc_dump_obj+0x25/0xa0
> > >>>>> [    3.116503]  mem_dump_obj+0x22/0x90
> > >>>>> [    3.116503]  __call_rcu_common+0x920/0xdc0
> > >>>>> [    3.116503]  ? __pfx_rcu_torture_err_cb+0x10/0x10
> > >>>>> [    3.116503]  ? __pfx___call_rcu_common+0x10/0x10
> > >>>>> [    3.116503]  ? rcu_torture_init+0x22b7/0x2380
> > >>>>> [    3.116503]  ? debug_smp_processor_id+0x1b/0x30
> > >>>>> [    3.116503]  ? rcu_is_watching+0x3e/0xc0
> > >>>>> [    3.116503]  call_rcu_hurry+0x14/0x20
> > >>>>> [    3.116503]  rcu_torture_init+0x2186/0x2380
> > >>>>> [    3.116503]  ? __pfx_rcu_torture_init+0x10/0x10
> > >>>>> [    3.116503]  ? __pfx_rt_mutex_slowunlock+0x10/0x10
> > >>>>> [    3.116503]  ? __pfx_irq_debugfs_init+0x10/0x10
> > >>>>> [    3.116503]  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> > >>>>> [    3.116503]  ? rt_spin_unlock+0x50/0xa0
> > >>>>> [    3.116503]  ? add_device_randomness+0xb1/0xe0
> > >>>>> [    3.116503]  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> > >>>>> [    3.116503]  ? __pfx_rcu_torture_init+0x10/0x10
> > >>>>> [    3.116503]  do_one_initcall+0xd8/0x430
> > >>>>> [    3.116503]  ? __pfx_do_one_initcall+0x10/0x10
> > >>>>> [    3.116503]  ? __kasan_check_read+0x15/0x20
> > >>>>> [    3.116503]  kernel_init_freeable+0x3a0/0x670
> > >>>>> [    3.116503]  ? rt_spin_unlock+0x50/0xa0
> > >>>>> [    3.116503]  ? __pfx_kernel_init+0x10/0x10
> > >>>>> [    3.116503]  kernel_init+0x23/0x160
> > >>>>> [    3.116503]  ? __pfx_kernel_init+0x10/0x10
> > >>>>> [    3.116503]  ret_from_fork+0x40/0x60
> > >>>>> [    3.116503]  ? __pfx_kernel_init+0x10/0x10
> > >>>>> [    3.116503]  ret_from_fork_asm+0x1b/0x30
> > >>>>> [    3.116503]  </TASK>
> > >>>>> [    3.116503]  non-slab/vmalloc memory
> > >>>>>
> > >>>>>
> > >>>>> as described by Lei Zhen:
> > >>>>>
> > >>>>> v2 --> v3:
> > >>>>> 1. I made statistics about the source of 'rhp'. kmem_valid_obj() accounts for
> > >>>>>   more than 97.5%, and vmalloc accounts for less than 1%. So change call
> > >>>>>   mem_dump_obj() to call kmem_dump_obj() can meet debugging requirements and
> > >>>>>   avoid the potential deadlock risk of vmalloc_dump_obj().
> > >>>>> -               mem_dump_obj(rhp);
> > >>>>> +               if (kmem_valid_obj(rhp))
> > >>>>> +                       kmem_dump_obj(rhp);
> > >>>>>
> > >>>
> > >>> I kind of dislike that even if it is just 1%.
> > >>>
> > >>> Another option is, Could you do the object dump from irq_work if it is
> > >>> a vmalloc address? That way there will be no vmap area lock held and
> > >>> no need to use workqueue or depend on schedule. And also disable the
> > >>> dump for PREEMPT_RT (since that's SWA violation).
> > >>
> > >>>
> > >>
> > >> Hello Joel
> > >>
> > >> Actually I want to know, is it necessary to dump the information of vmalloc
> >
> > I see. That is worth looking into..
> >
> > >>
> > >>
> > >>>
> > >>> I feel like this is a PREEMPT_RT problem and we should not make the
> > >>> !PREEMPT_RT debugging less effective for their sake.
> > >>>
> > >>> Maybe soemthing like:
> > >>>
> > >>> if (kmem_valid_obj(rhp)) {
> > >>>  kmem_dump_obj(rhp);
> > >>> } else  if (!in_interrupt() &&  !IS_ENABLED(CONFIG_PREEMPT_RT) {
> > >>>    mem_dump_obj(rhp);
> > >>> } else if (!IS_ENABLED(CONFIG_PREEMPT_RT))  {
> > >>>    // Queue an irq work to dump, drop CB list on the floor like Paul
> > >>> suggested and adjust RCU
> > >>> }
> > >>
> > >> I did something like this earlier:
> > >>
> > >> https://lore.kernel.org/lkml/20221118003441.3980437-1-qiang1.zhang@xxxxxxxxx/
> >
> > Cool.
> >
> > >>
> > >>>
> > >>> ?
> > >>>
> > >>> Perhaps the above can be wrapped in a helper and generically used...
> > >>>
> > >>>>   The discussion about vmap_area_lock deadlock in v2:
> > >>>>   https://lkml.org/lkml/2022/11/11/493
> > >>>>
> > >>>> so whether mem_dump_obj() should also be replaced by kmem_dump_obj() in
> > >>>> __call_rcu_common(),  since the call_rcu() can be called in interrupt context,
> > >>>> this may issue recursive spin vmap_area_lock  lock calls.
> > >>>
> > >>> That's why it could be done from irq_work.
> > >>>
> > >>> Thoughts?
> > >
> > > Even with irq_work to dump vmalloc info also need held vmap_area_lock spinlock,
> > > need to make sure that the vm_struct object is not freed
> >
> > This is true, I realized only after just sending my email. The trick I was referring
> > to was avoidance of deadlocks for irq disabled spin locks but this one is not.
> >
> > So perhaps WQ it is then.. or dropping vmalloc debug..
>
> But only for PREEMPT_RT, correct?  Or does the problem also extend to
> !PREEMPT_RT?
>

Hello, Paul

In the Preempt-RT kernel, due to acquire spin_lock in irq disable
critical section,
this triggers lockdep warning.
In non-Preempt-RT kernel, there is a risk of deadlock here(), when call_rcu() is
invoked in irq context and due to double call_rcu() trigger vmalloc debug,
(https://lkml.org/lkml/2022/11/11/493)
In an earlier discussion, I did something like
https://lore.kernel.org/lkml/20221118003441.3980437-1-qiang1.zhang@xxxxxxxxx/

but now, like lei said, vmalloc accounts for less than 1%,  so is
vmalloc debug necessary?
or dump vmalloc debug info in clean context.

Thanks
Zqiang




>
>                                                         Thanx, Paul




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux