On Tue, Jul 5, 2022 at 9:18 PM <dai.ngo@xxxxxxxxxx> wrote: > > Hi, > > During my low memory testing with 5.19.0-rc4, I ran into this > problem in btrfs where the thread tries to sleep on invalid > context: > > > Jul 3 04:08:44 nfsvmf24 kernel: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274 > Jul 3 04:08:44 nfsvmf24 kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 4308, name: nfsd > Jul 3 04:08:44 nfsvmf24 kernel: preempt_count: 1, expected: 0 > Jul 3 04:08:44 nfsvmf24 kernel: RCU nest depth: 0, expected: 0 > Jul 3 04:08:44 nfsvmf24 kernel: 4 locks held by nfsd/4308: > Jul 3 04:08:44 nfsvmf24 kernel: #0: ffff8881052d4438 (sb_writers#14){.+.+}-{0:0}, at: nfsd4_setattr+0x17f/0x3c0 [nfsd] > Jul 3 04:08:44 nfsvmf24 kernel: #1: ffff888141e5aa50 (&sb->s_type->i_mutex_key#16){+.+.}-{3:3}, at: nfsd_setattr+0x56a/0xe00 [nfsd] > Jul 3 04:08:44 nfsvmf24 kernel: #2: ffff8881052d4628 (sb_internal#2){.+.+}-{0:0}, at: btrfs_dirty_inode+0xda/0x1d0 > Jul 3 04:08:44 nfsvmf24 kernel: #3: ffff888105ed6628 (&root->inode_lock){+.+.}-{2:2}, at: btrfs_get_or_create_delayed_node+0x27a/0x430 > > Jul 3 04:08:44 nfsvmf24 kernel: Preemption disabled at: > Jul 3 04:08:44 nfsvmf24 kernel: [<0000000000000000>] 0x0 > Jul 3 04:08:44 nfsvmf24 kernel: CPU: 0 PID: 4308 Comm: nfsd Kdump: loaded Not tainted 5.19.0-rc4+ #1 > Jul 3 04:08:44 nfsvmf24 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 > Jul 3 04:08:45 nfsvmf24 kernel: Call Trace: > Jul 3 04:08:45 nfsvmf24 kernel: <TASK> > Jul 3 04:08:45 nfsvmf24 kernel: dump_stack_lvl+0x57/0x7d > Jul 3 04:08:45 nfsvmf24 kernel: __might_resched.cold+0x222/0x26b > Jul 3 04:08:45 nfsvmf24 kernel: kmem_cache_alloc_lru+0x159/0x2c0 > Jul 3 04:08:45 nfsvmf24 kernel: __xas_nomem+0x1a5/0x5d0 > Jul 3 04:08:45 nfsvmf24 kernel: ? lock_acquire+0x1bb/0x500 > Jul 3 04:08:45 nfsvmf24 kernel: __xa_insert+0xff/0x1d0 > Jul 3 04:08:45 nfsvmf24 kernel: ? __xa_cmpxchg+0x1f0/0x1f0 > Jul 3 04:08:45 nfsvmf24 kernel: ? lockdep_init_map_type+0x2c3/0x7b0 > Jul 3 04:08:45 nfsvmf24 kernel: ? rwlock_bug.part.0+0x90/0x90 > Jul 3 04:08:45 nfsvmf24 kernel: btrfs_get_or_create_delayed_node+0x295/0x430 > Jul 3 04:08:45 nfsvmf24 kernel: btrfs_delayed_update_inode+0x24/0x670 > Jul 3 04:08:45 nfsvmf24 kernel: ? btrfs_check_and_init_root_item+0x1f0/0x1f0 > Jul 3 04:08:45 nfsvmf24 kernel: ? start_transaction+0x219/0x12d0 > Jul 3 04:08:45 nfsvmf24 kernel: btrfs_update_inode+0x1aa/0x430 > Jul 3 04:08:45 nfsvmf24 kernel: btrfs_dirty_inode+0xf7/0x1d0 > Jul 3 04:08:45 nfsvmf24 kernel: btrfs_setattr+0x928/0x1400 > Jul 3 04:08:45 nfsvmf24 kernel: ? mark_held_locks+0x9e/0xe0 > Jul 3 04:08:45 nfsvmf24 kernel: ? _raw_spin_unlock+0x29/0x40 > Jul 3 04:08:45 nfsvmf24 kernel: ? btrfs_cont_expand+0x9a0/0x9a0 > Jul 3 04:08:45 nfsvmf24 kernel: ? fh_update+0x1e0/0x1e0 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: ? current_time+0x88/0xd0 > Jul 3 04:08:45 nfsvmf24 kernel: ? timestamp_truncate+0x230/0x230 > Jul 3 04:08:45 nfsvmf24 kernel: notify_change+0x4b3/0xe40 > Jul 3 04:08:45 nfsvmf24 kernel: ? down_write_nested+0xd4/0x130 > Jul 3 04:08:45 nfsvmf24 kernel: ? nfsd_setattr+0x984/0xe00 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: nfsd_setattr+0x984/0xe00 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: ? lock_downgrade+0x680/0x680 > Jul 3 04:08:45 nfsvmf24 kernel: ? nfsd_permission+0x310/0x310 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: ? __mnt_want_write+0x192/0x270 > Jul 3 04:08:45 nfsvmf24 kernel: nfsd4_setattr+0x1df/0x3c0 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: ? percpu_counter_add_batch+0x79/0x130 > Jul 3 04:08:45 nfsvmf24 kernel: nfsd4_proc_compound+0xce8/0x23e0 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: nfsd_dispatch+0x4ed/0xc10 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: ? svc_reserve+0xb5/0x130 [sunrpc] > Jul 3 04:08:45 nfsvmf24 kernel: svc_process_common+0xd3f/0x1b00 [sunrpc] > Jul 3 04:08:45 nfsvmf24 kernel: ? svc_generic_rpcbind_set+0x4d0/0x4d0 [sunrpc] > Jul 3 04:08:45 nfsvmf24 kernel: ? nfsd_svc+0xc50/0xc50 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: ? svc_sock_secure_port+0x37/0x50 [sunrpc] > Jul 3 04:08:45 nfsvmf24 kernel: ? svc_recv+0x1096/0x2350 [sunrpc] > Jul 3 04:08:45 nfsvmf24 kernel: svc_process+0x361/0x4f0 [sunrpc] > Jul 3 04:08:45 nfsvmf24 kernel: nfsd+0x2d6/0x570 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd] > Jul 3 04:08:45 nfsvmf24 kernel: kthread+0x2a1/0x340 > Jul 3 04:08:45 nfsvmf24 kernel: ? kthread_complete_and_exit+0x20/0x20 > Jul 3 04:08:45 nfsvmf24 kernel: ret_from_fork+0x22/0x30 > Jul 3 04:08:45 nfsvmf24 kernel: </TASK> > Jul 3 04:08:45 nfsvmf24 kernel: > > > I think the problem was introduced by commit 253bf57555e > btrfs: turn delayed_nodes_tree into an XArray. > > The spin_lock(&root->inode_lock) was held while xa_insert tries > to allocate memory and block. Yep. In this case we can actually call xa_insert() without holding that spinlock, it's safe against other concurrent calls to btrfs_get_or_create_delayed_node(), btrfs_get_delayed_node(), btrfs_kill_delayed_inode_items(), etc. However, looking at xa_insert() we have: xa_lock(xa); err = __xa_insert(xa, index, entry, gfp); xa_unlock(xa); And xa_lock() is defined as: #define xa_lock(xa) spin_lock(&(xa)->xa_lock) So we'll always be under a spinlock even if we change btrfs to not take the root->inode_lock spinlock. This seems more like a general problem outside btrfs' control. So CC'ing Willy to double check. We have another place in btrfs that calls xa_insert() while holding some btrfs spinlock too. Thanks. > > -Dai >