Re: Soft lockup with fstests on NFSv3 and NFSv4.0

Chuck Lever <chuck.lever@xxxxxxxxxx> · Tue, 5 Nov 2024 09:48:06 -0500

On Mon, Nov 04, 2024 at 06:04:21PM +0000, Trond Myklebust wrote:
> On Sat, 2024-11-02 at 15:32 +0000, Chuck Lever III wrote:
> > Hi-
> > 
> > I've noticed that my nightly fstests runs have been hanging
> > for the past few days for the fs-current and fs-next and
> > linux-6.11.y kernels.
> > 
> > I checked in on one of the NFSv3 clients, and the console
> > had this, over and over (same with the NFSv4.0 client):
> > 
> > 
> > [23860.805728] watchdog: BUG: soft lockup - CPU#2 stuck for 17446s!
> > [751:691784]
> > [23860.806601] Modules linked in: nfsv3 nfs_acl nfs lockd grace
> > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill nf_tables nfnetlink
> > siw intel_rapl_msr intel_rapl_common sunrpc ib_core kvm_intel
> > iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_pcsp snd_pcm kvm
> > snd_timer snd soundcore rapl i2c_i801 virtio_balloon lpc_ich
> > i2c_smbus virtio_net net_failover failover joydev loop fuse zram xfs
> > crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni
> > polyval_generic ghash_clmulni_intel virtio_console sha512_ssse3
> > virtio_blk serio_raw qemu_fw_cfg
> > [23860.812638] CPU: 2 UID: 0 PID: 691784 Comm: 751 Tainted:
> > G             L     6.12.0-rc5-g8c4f6fa04f3d #1
> > [23860.813529] Tainted: [L]=SOFTLOCKUP
> > [23860.813926] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS 1.16.3-2.fc40 04/01/2014
> > [23860.814745] RIP: 0010:_raw_spin_lock+0x1b/0x30
> > [23860.815218] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
> > 0f 1e fa 0f 1f 44 00 00 65 ff 05 48 02 ee 61 31 c0 ba 01 00 00 00 f0
> > 0f b1 17 <75> 05 c3 cc cc cc cc 89 c6 e8 77 04 00 00 90 c3 cc cc cc
> > cc 90 90
> > [23860.817076] RSP: 0018:ff55e5e888aef550 EFLAGS: 00000246
> > [23860.817687] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > 0000000000000002
> > [23860.818459] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> > ff29918f86f43ebc
> > [23860.819147] RBP: ff95ae8744fd8000 R08: 0000000000000000 R09:
> > 0000000000100000
> > [23860.819984] R10: 0000000000000000 R11: 000000000003ed8c R12:
> > ff29918f86f43ebc
> > [23860.820805] R13: ff95ae8744fd8000 R14: 0000000000000001 R15:
> > 0000000000000000
> > [23860.821602] FS:  00007f591e707740(0000) GS:ff299192afb00000(0000)
> > knlGS:0000000000000000
> > [23860.822464] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [23860.823030] CR2: 00007f6f6d2f1050 CR3: 0000000111ba0006 CR4:
> > 0000000000773ef0
> > [23860.823708] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [23860.824389] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [23860.825107] PKRU: 55555554
> > [23860.825406] Call Trace:
> > [23860.825721]  <IRQ>
> > [23860.825996]  ? watchdog_timer_fn+0x1e0/0x260
> > [23860.826434]  ? __pfx_watchdog_timer_fn+0x10/0x10
> > [23860.826902]  ? __hrtimer_run_queues+0x113/0x280
> > [23860.827362]  ? ktime_get+0x3e/0xf0
> > [23860.827781]  ? hrtimer_interrupt+0xfa/0x230
> > [23860.828283]  ? __sysvec_apic_timer_interrupt+0x55/0x120
> > [23860.828861]  ? sysvec_apic_timer_interrupt+0x6c/0x90
> > [23860.829356]  </IRQ>
> > [23860.829591]  <TASK>
> > [23860.829827]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > [23860.830428]  ? _raw_spin_lock+0x1b/0x30
> > [23860.830842]  nfs_folio_find_head_request+0x29/0x90 [nfs]
> > [23860.831398]  nfs_lock_and_join_requests+0x64/0x270 [nfs]
> > [23860.831953]  nfs_page_async_flush+0x1b/0x1f0 [nfs]
> > [23860.832447]  nfs_wb_folio+0x1a4/0x290 [nfs]
> > [23860.832903]  nfs_release_folio+0x62/0xb0 [nfs]
> > [23860.833376]  split_huge_page_to_list_to_order+0x453/0x1140
> > [23860.833930]  split_huge_pages_write+0x601/0xb30
> > [23860.834421]  ? syscall_exit_to_user_mode+0x10/0x210
> > [23860.835000]  ? do_syscall_64+0x8e/0x160
> > [23860.835399]  ? inode_security+0x22/0x60
> > [23860.835810]  full_proxy_write+0x54/0x80
> > [23860.836211]  vfs_write+0xf8/0x450
> > [23860.836560]  ? __x64_sys_fcntl+0x9b/0xe0
> > [23860.837023]  ? syscall_exit_to_user_mode+0x10/0x210
> > [23860.837589]  ksys_write+0x6f/0xf0
> > [23860.837950]  do_syscall_64+0x82/0x160
> > [23860.838396]  ? __x64_sys_fcntl+0x9b/0xe0
> > [23860.838871]  ? syscall_exit_to_user_mode+0x10/0x210
> > [23860.839433]  ? do_syscall_64+0x8e/0x160
> > [23860.839910]  ? syscall_exit_to_user_mode+0x10/0x210
> > [23860.840481]  ? do_syscall_64+0x8e/0x160
> > [23860.840950]  ? get_close_on_exec+0x34/0x40
> > [23860.841363]  ? do_fcntl+0x2d0/0x730
> > [23860.841727]  ? __x64_sys_fcntl+0x9b/0xe0
> > [23860.842168]  ? syscall_exit_to_user_mode+0x10/0x210
> > [23860.842684]  ? do_syscall_64+0x8e/0x160
> > [23860.843084]  ? clear_bhb_loop+0x25/0x80
> > [23860.843490]  ? clear_bhb_loop+0x25/0x80
> > [23860.843897]  ? clear_bhb_loop+0x25/0x80
> > [23860.844271]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [23860.844845] RIP: 0033:0x7f591e812f64
> > [23860.845242] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f
> > 84 00 00 00 00 00 f3 0f 1e fa 80 3d 05 74 0d 00 00 74 13 b8 01 00 00
> > 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec
> > 20 48 89
> > [23860.847175] RSP: 002b:00007ffc19051998 EFLAGS: 00000202 ORIG_RAX:
> > 0000000000000001
> > [23860.847923] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
> > 00007f591e812f64
> > [23860.848702] RDX: 0000000000000002 RSI: 0000562caf357b00 RDI:
> > 0000000000000001
> > [23860.849402] RBP: 00007ffc190519c0 R08: 0000000000000073 R09:
> > 0000000000000001
> > [23860.850081] R10: 0000000000000000 R11: 0000000000000202 R12:
> > 0000000000000002
> > [23860.850794] R13: 0000562caf357b00 R14: 00007f591e8e35c0 R15:
> > 00007f591e8e0f00
> > [23860.851592]  </TASK>
> > 
> 
> I suspect it might be commit b571cfcb9dca ("nfs: don't reuse partially
> completed requests in nfs_lock_and_join_requests").
> That patch appears to assume that if one request is complete, then the
> others will complete too before unlocking. I don't think that is a
> valid assumption, since other requests could hit a non-fatal error or a
> short write that would cause them not to complete.
> 
> So can you please try reverting only that patch and seeing if that
> fixes the problem?

I was not able to report publicly yesterday evening, but I did
report this privately to Trond and Anna.

Reverting b571cfcb9dca causes earlier breakage -- the test run does
not even get to the soft lockup without that commit applied.

-- 
Chuck Lever