On Feb 21, 2025, at 10:46, Dennis Zhou <dennis@xxxxxxxxxx> wrote: > > Hello, > > On Thu, Feb 20, 2025 at 03:37:26PM -0500, Kent Overstreet wrote: >> On Thu, Feb 20, 2025 at 06:16:43PM +0100, Vlastimil Babka wrote: >>> On 2/20/25 11:57, Alan Huang wrote: >>>> Ping >>>> >>>>> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: >>>>> >>>>> Adding pcpu people to the CC >>>>> >>>>> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote: >>>>>> The cycle: >>>>>> >>>>>> CPU0: CPU1: >>>>>> bc->lock pcpu_alloc_mutex >>>>>> pcpu_alloc_mutex bc->lock >>>>>> >>>>>> Reported-by: syzbot+fe63f377148a6371a9db@xxxxxxxxxxxxxxxxxxxxxxxxx >>>>>> Tested-by: syzbot+fe63f377148a6371a9db@xxxxxxxxxxxxxxxxxxxxxxxxx >>>>>> Signed-off-by: Alan Huang <mmpgouride@xxxxxxxxx> >>>>> >>>>> So pcpu_alloc_mutex -> fs_reclaim? >>>>> >>>>> That's really awkward; seems like something that might invite more >>>>> issues. We can apply your fix if we need to, but I want to hear with the >>>>> percpu people have to say first. >>>>> >>>>> ====================================================== >>>>> WARNING: possible circular locking dependency detected >>>>> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted >>>>> ------------------------------------------------------ >>>>> syz.0.21/5625 is trying to acquire lock: >>>>> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782 >>>>> >>>>> but task is already holding lock: >>>>> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804 >>>>> >>>>> which lock already depends on the new lock. >>>>> >>>>> >>>>> the existing dependency chain (in reverse order) is: >>>>> >>>>> -> #2 (&bc->lock){+.+.}-{4:4}: >>>>> lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851 >>>>> __mutex_lock_common kernel/locking/mutex.c:585 [inline] >>>>> __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730 >>>>> bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482 >>>>> do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437 >>>>> shrink_slab+0x1093/0x14d0 mm/shrinker.c:664 >>>>> shrink_one+0x43b/0x850 mm/vmscan.c:4868 >>>>> shrink_many mm/vmscan.c:4929 [inline] >>>>> lru_gen_shrink_node mm/vmscan.c:5007 [inline] >>>>> shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978 >>>>> kswapd_shrink_node mm/vmscan.c:6807 [inline] >>>>> balance_pgdat mm/vmscan.c:6999 [inline] >>>>> kswapd+0x20f3/0x3b10 mm/vmscan.c:7264 >>>>> kthread+0x7a9/0x920 kernel/kthread.c:464 >>>>> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148 >>>>> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 >>>>> >>>>> -> #1 (fs_reclaim){+.+.}-{0:0}: >>>>> lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851 >>>>> __fs_reclaim_acquire mm/page_alloc.c:3853 [inline] >>>>> fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867 >>>>> might_alloc include/linux/sched/mm.h:318 [inline] >>>>> slab_pre_alloc_hook mm/slub.c:4066 [inline] >>>>> slab_alloc_node mm/slub.c:4144 [inline] >>>>> __do_kmalloc_node mm/slub.c:4293 [inline] >>>>> __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306 >>>>> kmalloc_noprof include/linux/slab.h:905 [inline] >>>>> kzalloc_noprof include/linux/slab.h:1037 [inline] >>>>> pcpu_mem_zalloc mm/percpu.c:510 [inline] >>>>> pcpu_alloc_chunk mm/percpu.c:1430 [inline] >>>>> pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338 >>>>> pcpu_balance_populated mm/percpu.c:2063 [inline] >>>>> pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200 >>>>> process_one_work kernel/workqueue.c:3236 [inline] >>>>> process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317 >>>>> worker_thread+0x870/0xd30 kernel/workqueue.c:3398 >>>>> kthread+0x7a9/0x920 kernel/kthread.c:464 >>>>> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148 >>>>> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 >>> >>> Seeing this as part of the chain (fs reclaim from a worker doing >>> pcpu_balance_workfn) makes me think Michal's patch could be a fix to this: >>> >>> https://lore.kernel.org/all/20250206122633.167896-1-mhocko@xxxxxxxxxx/ >> >> Thanks for the link - that does look like just the thing. > > Sorry I missed the first email asking to weigh in. > > Michal's problem is a little bit different than what's happening here. > He's having an issue where a alloc_percpu_gfp(NOFS/NOIO) is considered > atomic and failing during probing. This is because we don't have enough > percpu memory backed to fulfill the "atomic" requests. > > Historically we've considered any allocation that's not GFP_KERNEL to be > atomic. Here it seems like the alloc_percpu() behind the bc->lock() > should have been an "atomic" allocation to prevent the lock cycle? I think so, if I understand it correctly, NOFS/NOIO could invoke the shrinker, so we can lock bc->lock again. And I think we should not rely on the implementation of alloc_percpu_gfp, but the GFP flags instead. Correct me if I'm wrong. > Thanks, > Dennis