Re: [PATCH] mm: slub: annotate kmem_cache_node->list_lock as raw_spinlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2023/4/12 13:51, Boqun Feng wrote:
On Tue, Apr 11, 2023 at 10:25:06PM +0800, Qi Zheng wrote:


On 2023/4/11 22:19, Vlastimil Babka wrote:
On 4/11/23 16:08, Qi Zheng wrote:


On 2023/4/11 21:40, Vlastimil Babka wrote:
On 4/11/23 15:08, Qi Zheng wrote:
The list_lock can be held in the critical section of
raw_spinlock, and then lockdep will complain about it
like below:

    =============================
    [ BUG: Invalid wait context ]
    6.3.0-rc6-next-20230411 #7 Not tainted
    -----------------------------
    swapper/0/1 is trying to lock:
    ffff888100055418 (&n->list_lock){....}-{3:3}, at: ___slab_alloc+0x73d/0x1330
    other info that might help us debug this:
    context-{5:5}
    2 locks held by swapper/0/1:
     #0: ffffffff824e8160 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic+0x22/0x2d0
     #1: ffff888136bede50 (&ACCESS_PRIVATE(rtpcp, lock)){....}-{2:2}, at: cblist_init_generic+0x232/0x2d0
    stack backtrace:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230411 #7
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl+0x77/0xc0
     __lock_acquire+0xa65/0x2950
     ? arch_stack_walk+0x65/0xf0
     ? arch_stack_walk+0x65/0xf0
     ? unwind_next_frame+0x602/0x8d0
     lock_acquire+0xe0/0x300
     ? ___slab_alloc+0x73d/0x1330
     ? find_usage_forwards+0x39/0x50
     ? check_irq_usage+0x162/0xa70
     ? __bfs+0x10c/0x2c0
     _raw_spin_lock_irqsave+0x4f/0x90
     ? ___slab_alloc+0x73d/0x1330
     ___slab_alloc+0x73d/0x1330
     ? fill_pool+0x16b/0x2a0
     ? look_up_lock_class+0x5d/0x160
     ? register_lock_class+0x48/0x500
     ? __lock_acquire+0xabc/0x2950
     ? fill_pool+0x16b/0x2a0
     kmem_cache_alloc+0x358/0x3b0
     ? __lock_acquire+0xabc/0x2950
     fill_pool+0x16b/0x2a0
     ? __debug_object_init+0x292/0x560
     ? lock_acquire+0xe0/0x300
     ? cblist_init_generic+0x232/0x2d0
     __debug_object_init+0x2c/0x560

This "__debug_object_init" is because INIT_WORK() is called in
cblist_init_generic(), so..

Yes, a more precise call stack is as follows:

cblist_init_generic
--> INIT_WORK
    --> lockdep_init_map
        --> lockdep_init_map_type
            --> register_lock_class
                --> init_data_structures_once
                    --> init_rcu_head
                        --> debug_object_init
                            --> __debug_object_init


     cblist_init_generic+0x147/0x2d0
     rcu_init_tasks_generic+0x15/0x190
     kernel_init_freeable+0x6e/0x3e0
     ? rest_init+0x1e0/0x1e0
     kernel_init+0x1b/0x1d0
     ? rest_init+0x1e0/0x1e0
     ret_from_fork+0x1f/0x30
     </TASK>

The fill_pool() can only be called in the !PREEMPT_RT kernel
or in the preemptible context of the PREEMPT_RT kernel, so
the above warning is not a real issue, but it's better to
annotate kmem_cache_node->list_lock as raw_spinlock to get
rid of such issue.

+ CC some RT and RCU people

Thanks.


AFAIK raw_spinlock is not just an annotation, but on RT it changes the
implementation from preemptible mutex to actual spin lock, so it would be

Yeah.

rather unfortunate to do that for a spurious warning. Can it be somehow
fixed in a better way?

... probably a better fix is to drop locks and call INIT_WORK(), or make
the cblist_init_generic() lockless (or part lockless), given it's just
initializing the cblist, it's probably doable. But I haven't taken a
careful look yet.

This might be a doable solution for this warning, but I also saw another stacks like the following on v5.15:

[   30.349171] Call Trace:
[   30.349171]  <TASK>
[   30.349171]  dump_stack_lvl+0x69/0x97
[   30.349171]  __lock_acquire+0x4a0/0x1aa0
[   30.349171]  lock_acquire+0x275/0x2e0
[   30.349171]  _raw_spin_lock_irqsave+0x4c/0x90
[   30.349171]  ___slab_alloc.constprop.95+0x3ea/0xa80
[   30.349171]  __slab_alloc.isra.89.constprop.94+0x1c/0x30
[   30.349171]  kmem_cache_alloc+0x2bd/0x320
[   30.349171]  fill_pool+0x1b2/0x2d0
[   30.349171]  __debug_object_init+0x2c/0x500
[   30.349171]  debug_object_activate+0x136/0x200
[   30.349171]  add_timer+0x10b/0x170
[   30.349171]  queue_delayed_work_on+0x63/0xa0
[   30.349171]  init_mm_internals+0x226/0x2b0
[   30.349171]  kernel_init_freeable+0x82/0x24e
[   30.349171]  kernel_init+0x17/0x140
[   30.349171]  ret_from_fork+0x1f/0x30
[   30.349171]  </TASK>

So I'm a bit confused whether to fix individual cases or should there be
a general solution.

Thanks,
Qi


Regards,
Boqun


It's indeed unfortunate for the warning in the commit message. But
functions like kmem_cache_alloc(GFP_ATOMIC) may indeed be called
in the critical section of raw_spinlock or in the hardirq context, which

Hmm, I thought they may not, actually.

will cause problem in the PREEMPT_RT kernel. So I still think it is
reasonable to convert kmem_cache_node->list_lock to raw_spinlock type.

It wouldn't be the complete solution anyway. Once we allow even a GFP_ATOMIC
slab allocation for such context, it means also page allocation can happen
to refill the slabs, so lockdep will eventually complain about zone->lock,
and who knows what else.

Oh, indeed. :(


In addition, there are many fix patches for this kind of warning in the
git log, so I also think there should be a general and better solution. :)

Maybe, but given above, I doubt it's this one.






--
Thanks,
Qi

--
Thanks,
Qi



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux