[bug report] kernel 5.18.0-rc4 oops with invalid wait context

dai.ngo@xxxxxxxxxx · Tue, 3 May 2022 12:11:00 -0700

Hi,

I just noticed there were couple of oops in my Oracle6 in nfs4.dev network.
I'm not sure who ran which tests (be useful to know) that caused these oops.

Here is the stack traces:

[286123.154006] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1585
[286123.155126] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3983, name: nfsd
[286123.155872] preempt_count: 1, expected: 0
[286123.156443] RCU nest depth: 0, expected: 0
[286123.156771] 1 lock held by nfsd/3983:
[286123.156786]  #0: ffff888006762520 (&clp->cl_lock){+.+.}-{2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
[286123.156949] Preemption disabled at:
[286123.156961] [<0000000000000000>] 0x0
[286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not tainted 5.18.0-rc4+ #1
[286123.157539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[286123.157552] Call Trace:
[286123.157565]  <TASK>
[286123.157581]  dump_stack_lvl+0x57/0x7d
[286123.157609]  __might_resched.cold+0x222/0x26b
[286123.157644]  down_read_nested+0x68/0x420
[286123.157671]  ? down_write_nested+0x130/0x130
[286123.157686]  ? rwlock_bug.part.0+0x90/0x90
[286123.157705]  ? rcu_read_lock_sched_held+0x81/0xb0
[286123.157749]  xfs_file_fsync+0x3b9/0x820
[286123.157776]  ? lock_downgrade+0x680/0x680
[286123.157798]  ? xfs_filemap_pfn_mkwrite+0x10/0x10
[286123.157823]  ? nfsd_file_put+0x100/0x100 [nfsd]
[286123.157921]  nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
[286123.158007]  nfsd_file_put+0x79/0x100 [nfsd]
[286123.158088]  check_for_locks+0x152/0x200 [nfsd]
[286123.158191]  nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
[286123.158393]  ? nfsd4_locku+0xd10/0xd10 [nfsd]
[286123.158488]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.158525]  nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
[286123.158699]  nfsd_dispatch+0x4ed/0xc30 [nfsd]
[286123.158974]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.159010]  svc_process_common+0xd8e/0x1b20 [sunrpc]
[286123.159043]  ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
[286123.159043]  ? nfsd_svc+0xc50/0xc50 [nfsd]
[286123.159043]  ? svc_sock_secure_port+0x27/0x40 [sunrpc]
[286123.159043]  ? svc_recv+0x1100/0x2390 [sunrpc]
[286123.159043]  svc_process+0x361/0x4f0 [sunrpc]
[286123.159043]  nfsd+0x2d6/0x570 [nfsd]
[286123.159043]  ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
[286123.159043]  kthread+0x29f/0x340
[286123.159043]  ? kthread_complete_and_exit+0x20/0x20
[286123.159043]  ret_from_fork+0x22/0x30
[286123.159043]  </TASK>
[286123.187052] BUG: scheduling while atomic: nfsd/3983/0x00000002
[286123.187551] INFO: lockdep is turned off.
[286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc
[286123.188527] Preemption disabled at:
[286123.188535] [<0000000000000000>] 0x0
[286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Tainted: G        W         5.18.0-rc4+ #1
[286123.190233] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[286123.190910] Call Trace:
[286123.190910]  <TASK>
[286123.190910]  dump_stack_lvl+0x57/0x7d
[286123.190910]  __schedule_bug.cold+0x133/0x143
[286123.190910]  __schedule+0x16c9/0x20a0
[286123.190910]  ? schedule_timeout+0x314/0x510
[286123.190910]  ? __sched_text_start+0x8/0x8
[286123.190910]  ? internal_add_timer+0xa4/0xe0
[286123.190910]  schedule+0xd7/0x1f0
[286123.190910]  schedule_timeout+0x319/0x510
[286123.190910]  ? rcu_read_lock_held_common+0xe/0xa0
[286123.190910]  ? usleep_range_state+0x150/0x150
[286123.190910]  ? lock_acquire+0x331/0x490
[286123.190910]  ? init_timer_on_stack_key+0x50/0x50
[286123.190910]  ? do_raw_spin_lock+0x116/0x260
[286123.190910]  ? rwlock_bug.part.0+0x90/0x90
[286123.190910]  io_schedule_timeout+0x26/0x80
[286123.190910]  wait_for_completion_io_timeout+0x16a/0x340
[286123.190910]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910]  ? wait_for_completion+0x330/0x330
[286123.190910]  submit_bio_wait+0x135/0x1d0
[286123.190910]  ? submit_bio_wait_endio+0x40/0x40
[286123.190910]  ? xfs_iunlock+0xd5/0x300
[286123.190910]  ? bio_init+0x295/0x470
[286123.190910]  blkdev_issue_flush+0x69/0x80
[286123.190910]  ? blk_unregister_queue+0x1e0/0x1e0
[286123.190910]  ? bio_kmalloc+0x90/0x90
[286123.190910]  ? xfs_iunlock+0x1b4/0x300
[286123.190910]  xfs_file_fsync+0x354/0x820
[286123.190910]  ? lock_downgrade+0x680/0x680
[286123.190910]  ? xfs_filemap_pfn_mkwrite+0x10/0x10
[286123.190910]  ? nfsd_file_put+0x100/0x100 [nfsd]
[286123.190910]  nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
[286123.190910]  nfsd_file_put+0x79/0x100 [nfsd]
[286123.190910]  check_for_locks+0x152/0x200 [nfsd]
[286123.190910]  nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
[286123.190910]  ? nfsd4_locku+0xd10/0xd10 [nfsd]
[286123.190910]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910]  nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
[286123.190910]  nfsd_dispatch+0x4ed/0xc30 [nfsd]
[286123.190910]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910]  svc_process_common+0xd8e/0x1b20 [sunrpc]
[286123.190910]  ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
[286123.190910]  ? nfsd_svc+0xc50/0xc50 [nfsd]
[286123.190910]  ? svc_sock_secure_port+0x27/0x40 [sunrpc]
[286123.190910]  ? svc_recv+0x1100/0x2390 [sunrpc]
[286123.190910]  svc_process+0x361/0x4f0 [sunrpc]
[286123.190910]  nfsd+0x2d6/0x570 [nfsd]
[286123.190910]  ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
[286123.190910]  kthread+0x29f/0x340
[286123.190910]  ? kthread_complete_and_exit+0x20/0x20
[286123.190910]  ret_from_fork+0x22/0x30
[286123.190910]  </TASK>

The problem is the process tries to sleep while holding the
cl_lock by nfsd4_release_lockowner. I think the problem was
introduced with the filemap_flush in nfsd_file_put since
'b6669305d35a nfsd: Reduce the number of calls to nfsd_file_gc()'.
The filemap_flush is later replaced by nfsd_file_flush by
'6b8a94332ee4f nfsd: Fix a write performance regression'.

-Dai