Re: [syzbot] [xfs?] WARNING in __queue_delayed_work

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 10 Apr 2023 18:15:57 -0700

On Mon, Apr 10, 2023 at 08:20:22PM +0800, Hillf Danton wrote:
> On 07 Apr 2023 10:04:49 -0700
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    7e364e56293b Linux 6.3-rc5
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13241195c80000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=e3b9dc6616d797bb
> > dashboard link: https://syzkaller.appspot.com/bug?extid=5ed016962f5137a09c7c
> > compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > userspace arch: i386
> > 
> > Unfortunately, I don't have any reproducer for this issue yet.
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+5ed016962f5137a09c7c@xxxxxxxxxxxxxxxxxxxxxxxxx
> > 
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 102 at kernel/workqueue.c:1445 __queue_work+0xd44/0x1120 kernel/workqueue.c:1444
> > Modules linked in:
> > CPU: 1 PID: 102 Comm: kswapd0 Not tainted 6.3.0-rc5-syzkaller #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> > RIP: 0010:__queue_work+0xd44/0x1120 kernel/workqueue.c:1444

Gross.  I just got one of these splats too, in:

__queue_work+0x3a2/0x4a0
call_timer_fn+0x24/0x120
__run_timers.part.0+0x170/0x280
run_timer_softirq+0x31/0x60
__do_softirq+0xf6/0x2fd
irq_exit_rcu+0xc5/0x110
sysvec_apic_timer_interrupt+0x8e/0xc0

I guess someone might have done:

Thread 0: xfs_inodegc_queue_all(mp);

Thread 1: <add inodegc work>
Thread 1: mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, <nonzero>);

Thread 0: drain_workqueue(mp->m_inodegc_wq);

<Timer fires, splats, everyone halts>

But I can't really tell, the VM froze.  Any suggestions on how to fix this?

--D

> > Code: e0 07 83 c0 03 38 d0 7c 09 84 d2 74 05 e8 74 0c 81 00 8b 5b 2c 31 ff 83 e3 20 89 de e8 c5 fb 2f 00 85 db 75 42 e8 6c ff 2f 00 <0f> 0b e9 3c f9 ff ff e8 60 ff 2f 00 0f 0b e9 ce f8 ff ff e8 54 ff
> > RSP: 0000:ffffc90000ce7638 EFLAGS: 00010093
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: ffff888015f73a80 RSI: ffffffff8152d854 RDI: 0000000000000005
> > RBP: 0000000000000002 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffffe8ffffb03348
> > R13: ffff888078462000 R14: ffffe8ffffb03390 R15: ffff888078462000
> > FS:  0000000000000000(0000) GS:ffff88802ca80000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000cfa5bb CR3: 0000000025fde000 CR4: 0000000000150ee0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  __queue_delayed_work+0x1c8/0x270 kernel/workqueue.c:1672
> >  mod_delayed_work_on+0xe1/0x220 kernel/workqueue.c:1746
> >  xfs_inodegc_shrinker_scan fs/xfs/xfs_icache.c:2212 [inline]
> >  xfs_inodegc_shrinker_scan+0x250/0x4f0 fs/xfs/xfs_icache.c:2191
> >  do_shrink_slab+0x428/0xaa0 mm/vmscan.c:853
> >  shrink_slab+0x175/0x660 mm/vmscan.c:1013
> >  shrink_one+0x502/0x810 mm/vmscan.c:5343
> >  shrink_many mm/vmscan.c:5394 [inline]
> >  lru_gen_shrink_node mm/vmscan.c:5511 [inline]
> >  shrink_node+0x2064/0x35f0 mm/vmscan.c:6459
> >  kswapd_shrink_node mm/vmscan.c:7262 [inline]
> >  balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452
> >  kswapd+0x677/0xd60 mm/vmscan.c:7712
> >  kthread+0x2e8/0x3a0 kernel/kthread.c:376
> >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
> >  </TASK>
> 
> Looks like a valid race.
> 
> 	xfs_inodegc_shrinker_scan()		xfs_inodegc_stop()
> 	---					---
> 	if (!xfs_is_inodegc_enabled(mp))
> 		return SHRINK_STOP;
> 						if (!xfs_clear_inodegc_enabled(mp))
> 							return;
> 						xfs_inodegc_queue_all(mp);
> 						drain_workqueue(mp->m_inodegc_wq);
> 							wq->flags |= __WQ_DRAINING;
> 	mod_delayed_work_on()