Re: [bug report] kernel 5.18.0-rc4 oops with invalid wait context

dai.ngo@xxxxxxxxxx · Tue, 10 May 2022 11:56:15 -0700

On 5/10/22 7:24 AM, Chuck Lever III wrote:

On May 3, 2022, at 3:11 PM, dai.ngo@xxxxxxxxxx wrote:

Hi,

I just noticed there were couple of oops in my Oracle6 in nfs4.dev network.
I'm not sure who ran which tests (be useful to know) that caused these oops.

Here is the stack traces:

[286123.154006] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1585
[286123.155126] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3983, name: nfsd
[286123.155872] preempt_count: 1, expected: 0
[286123.156443] RCU nest depth: 0, expected: 0
[286123.156771] 1 lock held by nfsd/3983:
[286123.156786]  #0: ffff888006762520 (&clp->cl_lock){+.+.}-{2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
[286123.156949] Preemption disabled at:
[286123.156961] [<0000000000000000>] 0x0
[286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not tainted 5.18.0-rc4+ #1
[286123.157539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[286123.157552] Call Trace:
[286123.157565]  <TASK>
[286123.157581]  dump_stack_lvl+0x57/0x7d
[286123.157609]  __might_resched.cold+0x222/0x26b
[286123.157644]  down_read_nested+0x68/0x420
[286123.157671]  ? down_write_nested+0x130/0x130
[286123.157686]  ? rwlock_bug.part.0+0x90/0x90
[286123.157705]  ? rcu_read_lock_sched_held+0x81/0xb0
[286123.157749]  xfs_file_fsync+0x3b9/0x820
[286123.157776]  ? lock_downgrade+0x680/0x680
[286123.157798]  ? xfs_filemap_pfn_mkwrite+0x10/0x10
[286123.157823]  ? nfsd_file_put+0x100/0x100 [nfsd]
[286123.157921]  nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
[286123.158007]  nfsd_file_put+0x79/0x100 [nfsd]
[286123.158088]  check_for_locks+0x152/0x200 [nfsd]
[286123.158191]  nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
[286123.158393]  ? nfsd4_locku+0xd10/0xd10 [nfsd]
[286123.158488]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.158525]  nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
[286123.158699]  nfsd_dispatch+0x4ed/0xc30 [nfsd]
[286123.158974]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.159010]  svc_process_common+0xd8e/0x1b20 [sunrpc]
[286123.159043]  ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
[286123.159043]  ? nfsd_svc+0xc50/0xc50 [nfsd]
[286123.159043]  ? svc_sock_secure_port+0x27/0x40 [sunrpc]
[286123.159043]  ? svc_recv+0x1100/0x2390 [sunrpc]
[286123.159043]  svc_process+0x361/0x4f0 [sunrpc]
[286123.159043]  nfsd+0x2d6/0x570 [nfsd]
[286123.159043]  ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
[286123.159043]  kthread+0x29f/0x340
[286123.159043]  ? kthread_complete_and_exit+0x20/0x20
[286123.159043]  ret_from_fork+0x22/0x30
[286123.159043]  </TASK>
[286123.187052] BUG: scheduling while atomic: nfsd/3983/0x00000002
[286123.187551] INFO: lockdep is turned off.
[286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc
[286123.188527] Preemption disabled at:
[286123.188535] [<0000000000000000>] 0x0
[286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Tainted: G        W         5.18.0-rc4+ #1
[286123.190233] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[286123.190910] Call Trace:
[286123.190910]  <TASK>
[286123.190910]  dump_stack_lvl+0x57/0x7d
[286123.190910]  __schedule_bug.cold+0x133/0x143
[286123.190910]  __schedule+0x16c9/0x20a0
[286123.190910]  ? schedule_timeout+0x314/0x510
[286123.190910]  ? __sched_text_start+0x8/0x8
[286123.190910]  ? internal_add_timer+0xa4/0xe0
[286123.190910]  schedule+0xd7/0x1f0
[286123.190910]  schedule_timeout+0x319/0x510
[286123.190910]  ? rcu_read_lock_held_common+0xe/0xa0
[286123.190910]  ? usleep_range_state+0x150/0x150
[286123.190910]  ? lock_acquire+0x331/0x490
[286123.190910]  ? init_timer_on_stack_key+0x50/0x50
[286123.190910]  ? do_raw_spin_lock+0x116/0x260
[286123.190910]  ? rwlock_bug.part.0+0x90/0x90
[286123.190910]  io_schedule_timeout+0x26/0x80
[286123.190910]  wait_for_completion_io_timeout+0x16a/0x340
[286123.190910]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910]  ? wait_for_completion+0x330/0x330
[286123.190910]  submit_bio_wait+0x135/0x1d0
[286123.190910]  ? submit_bio_wait_endio+0x40/0x40
[286123.190910]  ? xfs_iunlock+0xd5/0x300
[286123.190910]  ? bio_init+0x295/0x470
[286123.190910]  blkdev_issue_flush+0x69/0x80
[286123.190910]  ? blk_unregister_queue+0x1e0/0x1e0
[286123.190910]  ? bio_kmalloc+0x90/0x90
[286123.190910]  ? xfs_iunlock+0x1b4/0x300
[286123.190910]  xfs_file_fsync+0x354/0x820
[286123.190910]  ? lock_downgrade+0x680/0x680
[286123.190910]  ? xfs_filemap_pfn_mkwrite+0x10/0x10
[286123.190910]  ? nfsd_file_put+0x100/0x100 [nfsd]
[286123.190910]  nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
[286123.190910]  nfsd_file_put+0x79/0x100 [nfsd]
[286123.190910]  check_for_locks+0x152/0x200 [nfsd]
[286123.190910]  nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
[286123.190910]  ? nfsd4_locku+0xd10/0xd10 [nfsd]
[286123.190910]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910]  nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
[286123.190910]  nfsd_dispatch+0x4ed/0xc30 [nfsd]
[286123.190910]  ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910]  svc_process_common+0xd8e/0x1b20 [sunrpc]
[286123.190910]  ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
[286123.190910]  ? nfsd_svc+0xc50/0xc50 [nfsd]
[286123.190910]  ? svc_sock_secure_port+0x27/0x40 [sunrpc]
[286123.190910]  ? svc_recv+0x1100/0x2390 [sunrpc]
[286123.190910]  svc_process+0x361/0x4f0 [sunrpc]
[286123.190910]  nfsd+0x2d6/0x570 [nfsd]
[286123.190910]  ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
[286123.190910]  kthread+0x29f/0x340
[286123.190910]  ? kthread_complete_and_exit+0x20/0x20
[286123.190910]  ret_from_fork+0x22/0x30
[286123.190910]  </TASK>

The problem is the process tries to sleep while holding the
cl_lock by nfsd4_release_lockowner. I think the problem was
introduced with the filemap_flush in nfsd_file_put since
'b6669305d35a nfsd: Reduce the number of calls to nfsd_file_gc()'.
The filemap_flush is later replaced by nfsd_file_flush by
'6b8a94332ee4f nfsd: Fix a write performance regression'.
That seems plausible, given the traces above.

But it begs the question: why was a vfs_fsync() needed by
RELEASE_LOCKOWNER in this case? I've tried to reproduce the
problem, and even added a might_sleep() call in nfsd_file_flush()
but haven't been able to reproduce.

I haven't found a way to force the NFSD_FILE_HASHED bit to be
cleared when the client sends RELEASE_LOCKOWNER so nfsd_file_flush
is called. But if you add might_sleep() to nfsd_file_put() and
run pynfs 4.0 RLOWN1 test, the problem will happen.

Since this was 5.18-rc4, would you open a bug report on
bugzilla.linux-nfs.org and copy in all this detail?

will do. Note that the kernel on Oracle6 is 5.18-rc4 with
Courteous server patches v24 but I don't think it's related
to this problem.

-Dai

--
Chuck Lever