Re: [PATCH] kernfs: convert kernfs_idr_lock to an irq safe raw spinlock

Andrea Righi <andrea.righi@xxxxxxxxxxxxx> · Tue, 9 Jan 2024 18:05:09 +0100

On Tue, Jan 09, 2024 at 05:35:36PM +0100, Geert Uytterhoeven wrote:
...
> Thanks for your patch, which is now commit c312828c37a72fe2
> ("kernfs: convert kernfs_idr_lock to an irq safe raw spinlock")
> in driver-core/driver-core-next.
> 
> Unfortunately this interacts badly with commit 4eff7d62abdeb293 ("Revert
> "mm/kmemleak: move the initialisation of object to __link_object"")
> in v6.7-rc5.
> 
> driver-core/driver-core-next is still at v6.7-rc3, so it does not
> yet have commit 4eff7d62abdeb293, and thus still triggers:
> 
>     =============================
>     [ BUG: Invalid wait context ]
>     6.7.0-rc3-kzm9g-00052-gc312828c37a7 #576 Not tainted
>     -----------------------------
>     swapper/0 is trying to lock:
>     c0c6e3c4 (&zone->lock){....}-{3:3}, at: __rmqueue_pcplist+0x358/0x3c8
>     other info that might help us debug this:
>     context-{5:5}
>     3 locks held by swapper/0:
>      #0: c0bf35a0 (slab_mutex){....}-{4:4}, at:
> kmem_cache_create_usercopy+0xc8/0x2d0
>      #1: c0bfab0c (kmemleak_lock){....}-{2:2}, at: __create_object+0x2c/0x7c
>      #2: dfbc8c90 (&pcp->lock){....}-{3:3}, at:
> get_page_from_freelist+0x1a0/0x684
>     stack backtrace:
>     CPU: 0 PID: 0 Comm: swapper Not tainted
> 6.7.0-rc3-kzm9g-00052-gc312828c37a7 #576
>     Hardware name: Generic SH73A0 (Flattened Device Tree)
>      unwind_backtrace from show_stack+0x10/0x14
>      show_stack from dump_stack_lvl+0x68/0x90
>      dump_stack_lvl from __lock_acquire+0x3cc/0x168c
>      __lock_acquire from lock_acquire+0x274/0x30c
>      lock_acquire from _raw_spin_lock_irqsave+0x50/0x64
>      _raw_spin_lock_irqsave from __rmqueue_pcplist+0x358/0x3c8
>      __rmqueue_pcplist from get_page_from_freelist+0x3bc/0x684
>      get_page_from_freelist from __alloc_pages+0xe8/0xad8
>      __alloc_pages from __stack_depot_save+0x160/0x398
>      __stack_depot_save from set_track_prepare+0x48/0x74
>      set_track_prepare from __link_object+0xac/0x204
>      __link_object from __create_object+0x48/0x7c
>      __create_object from kmemleak_alloc+0x2c/0x38
>      kmemleak_alloc from slab_post_alloc_hook.constprop.0+0x9c/0xac
>      slab_post_alloc_hook.constprop.0 from kmem_cache_alloc+0xcc/0x148
>      kmem_cache_alloc from kmem_cache_create_usercopy+0x1c4/0x2d0
>      kmem_cache_create_usercopy from kmem_cache_create+0x1c/0x24
>      kmem_cache_create from kmemleak_init+0x58/0xfc
>      kmemleak_init from mm_core_init+0x244/0x2c8
>      mm_core_init from start_kernel+0x274/0x528
>      start_kernel from 0x0
> 
> After merging driver-core/driver-core-next into a tree based on
> v6.7-rc5, or after cherry-picking commit 4eff7d62abdeb293 into
> driver-core/driver-core-next, the above BUG is gone, but a different
> one appears:
> 
>     =============================
>     [ BUG: Invalid wait context ]
>     6.7.0-rc5-kzm9g-00251-g655022a45b1c #578 Not tainted
>     -----------------------------
>     swapper/0/0 is trying to lock:
>     dfbcd488 (&c->lock){....}-{3:3}, at: local_lock_acquire+0x0/0xa4
>     other info that might help us debug this:
>     context-{5:5}
>     2 locks held by swapper/0/0:
>      #0: dfbc9c60 (lock){+.+.}-{3:3}, at: local_lock_acquire+0x0/0xa4
>      #1: c0c012a8 (kernfs_idr_lock){....}-{2:2}, at:
> __kernfs_new_node.constprop.0+0x68/0x258
>     stack backtrace:
>     CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 6.7.0-rc5-kzm9g-00251-g655022a45b1c #578
>     Hardware name: Generic SH73A0 (Flattened Device Tree)
>      unwind_backtrace from show_stack+0x10/0x14
>      show_stack from dump_stack_lvl+0x68/0x90
>      dump_stack_lvl from __lock_acquire+0x3cc/0x168c
>      __lock_acquire from lock_acquire+0x274/0x30c
>      lock_acquire from local_lock_acquire+0x28/0xa4
>      local_lock_acquire from ___slab_alloc+0x234/0x8a8
>      ___slab_alloc from __slab_alloc.constprop.0+0x30/0x44
>      __slab_alloc.constprop.0 from kmem_cache_alloc+0x7c/0x148
>      kmem_cache_alloc from radix_tree_node_alloc.constprop.0+0x44/0xdc
>      radix_tree_node_alloc.constprop.0 from idr_get_free+0x110/0x2b8
>      idr_get_free from idr_alloc_u32+0x9c/0x108
>      idr_alloc_u32 from idr_alloc_cyclic+0x50/0xb8
>      idr_alloc_cyclic from __kernfs_new_node.constprop.0+0x88/0x258
>      __kernfs_new_node.constprop.0 from kernfs_create_root+0xbc/0x154
>      kernfs_create_root from sysfs_init+0x18/0x5c
>      sysfs_init from mnt_init+0xc4/0x220
>      mnt_init from vfs_caches_init+0x6c/0x88
>      vfs_caches_init from start_kernel+0x474/0x528
>      start_kernel from 0x0
> 
> Reverting commit c312828c37a72fe2 fixes that.
> I have seen this issue on several Renesas arm32 and arm64 platforms.
> 
> Also, I am wondering if the issue fixed by commit c312828c37a72fe2
> can still be reproduced on v6.7-rc5 or later?

Yep, I can still reproduce it (this is with v6.7):

[    3.082273] 
[    3.082822] =====================================================
[    3.084543] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[    3.086252] 6.7.0-virtme #4 Not tainted
[    3.087002] -----------------------------------------------------
[    3.087385] swapper/5/0 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[    3.087768] ffffffff8f9c5378 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1d/0x80
[    3.088335] 
[    3.088335] and this task is already holding:
[    3.088685] ffff8a83becbf758 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xda/0xef0
[    3.089128] which would create a new lock dependency:
[    3.089435]  (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2}
[    3.089827] 
[    3.089827] but this new dependency connects a HARDIRQ-irq-safe lock:
[    3.090296]  (&rq->__lock){-.-.}-{2:2}
[    3.090297] 
[    3.090297] ... which became HARDIRQ-irq-safe at:
[    3.090885]   lock_acquire+0xcb/0x2c0
[    3.091108]   _raw_spin_lock_nested+0x2e/0x40
[    3.091374]   scheduler_tick+0x5b/0x3d0
[    3.091607]   update_process_times+0x9c/0xb0
[    3.091867]   tick_periodic+0x27/0xe0
[    3.092089]   tick_handle_periodic+0x24/0x70
[    3.092351]   timer_interrupt+0x18/0x30
[    3.092585]   __handle_irq_event_percpu+0x8b/0x240
[    3.092865]   handle_irq_event+0x38/0x80
[    3.093095]   handle_level_irq+0x90/0x170
[    3.093340]   __common_interrupt+0x4a/0xf0
[    3.093586]   common_interrupt+0x83/0xa0
[    3.093820]   asm_common_interrupt+0x26/0x40
[    3.094080]   _raw_spin_unlock_irqrestore+0x36/0x70
[    3.094381]   __setup_irq+0x441/0x6a0
[    3.094602]   request_threaded_irq+0xe5/0x190
[    3.094862]   hpet_time_init+0x3a/0x60
[    3.095090]   x86_late_time_init+0x1b/0x40
[    3.095344]   start_kernel+0x53a/0x6a0
[    3.095569]   x86_64_start_reservations+0x18/0x30
[    3.095849]   x86_64_start_kernel+0xc5/0xe0
[    3.096097]   secondary_startup_64_no_verify+0x178/0x17b
[    3.096426] 
[    3.096426] to a HARDIRQ-irq-unsafe lock:
[    3.096749]  (kernfs_idr_lock){+.+.}-{2:2}
[    3.096751] 
[    3.096751] ... which became HARDIRQ-irq-unsafe at:
[    3.097372] ...
[    3.097372]   lock_acquire+0xcb/0x2c0
[    3.097701]   _raw_spin_lock+0x30/0x40
[    3.097925]   __kernfs_new_node.isra.0+0x83/0x280
[    3.098205]   kernfs_create_root+0xf6/0x1d0
[    3.098463]   sysfs_init+0x1b/0x70
[    3.098670]   mnt_init+0xd9/0x2a0
[    3.098872]   vfs_caches_init+0xcf/0xe0
[    3.099105]   start_kernel+0x58a/0x6a0
[    3.099334]   x86_64_start_reservations+0x18/0x30
[    3.099613]   x86_64_start_kernel+0xc5/0xe0
[    3.099862]   secondary_startup_64_no_verify+0x178/0x17b
[    3.100175] 
[    3.100175] other info that might help us debug this:
[    3.100175] 
[    3.100652]  Possible interrupt unsafe locking scenario:
[    3.100652] 
[    3.101049]        CPU0                    CPU1
[    3.101323]        ----                    ----
[    3.101641]   lock(kernfs_idr_lock);
[    3.101909]                                local_irq_disable();
[    3.102473]                                lock(&rq->__lock);
[    3.102854]                                lock(kernfs_idr_lock);
[    3.103171]   <Interrupt>
[    3.103308]     lock(&rq->__lock);
[    3.103492] 
[    3.103492]  *** DEADLOCK ***

I'm wondering if using a regular spinlock instead of a raw spinlock
could be a reasonable compromise.

We have a GFP_ATOMIC allocation in __kernfs_new_node():

	raw_spin_lock_irqsave(&kernfs_idr_lock, irqflags);
	ret = idr_alloc_cyclic(&root->ino_idr, kn, 1, 0, GFP_ATOMIC);
	...
        raw_spin_unlock_irqrestore(&kernfs_idr_lock, irqflags);

That should become valid using a
spin_lock_irqsave/spin_unlock_irqrestore(), right?

Thanks,
-Andrea

> 
> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds