Re: Soft lockup with fstests on NFSv3 and NFSv4.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2024-11-02 at 15:32 +0000, Chuck Lever III wrote:
> Hi-
> 
> I've noticed that my nightly fstests runs have been hanging
> for the past few days for the fs-current and fs-next and
> linux-6.11.y kernels.
> 
> I checked in on one of the NFSv3 clients, and the console
> had this, over and over (same with the NFSv4.0 client):
> 
> 
> [23860.805728] watchdog: BUG: soft lockup - CPU#2 stuck for 17446s!
> [751:691784]
> [23860.806601] Modules linked in: nfsv3 nfs_acl nfs lockd grace
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill nf_tables nfnetlink
> siw intel_rapl_msr intel_rapl_common sunrpc ib_core kvm_intel
> iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_pcsp snd_pcm kvm
> snd_timer snd soundcore rapl i2c_i801 virtio_balloon lpc_ich
> i2c_smbus virtio_net net_failover failover joydev loop fuse zram xfs
> crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni
> polyval_generic ghash_clmulni_intel virtio_console sha512_ssse3
> virtio_blk serio_raw qemu_fw_cfg
> [23860.812638] CPU: 2 UID: 0 PID: 691784 Comm: 751 Tainted:
> G             L     6.12.0-rc5-g8c4f6fa04f3d #1
> [23860.813529] Tainted: [L]=SOFTLOCKUP
> [23860.813926] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.16.3-2.fc40 04/01/2014
> [23860.814745] RIP: 0010:_raw_spin_lock+0x1b/0x30
> [23860.815218] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
> 0f 1e fa 0f 1f 44 00 00 65 ff 05 48 02 ee 61 31 c0 ba 01 00 00 00 f0
> 0f b1 17 <75> 05 c3 cc cc cc cc 89 c6 e8 77 04 00 00 90 c3 cc cc cc
> cc 90 90
> [23860.817076] RSP: 0018:ff55e5e888aef550 EFLAGS: 00000246
> [23860.817687] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000002
> [23860.818459] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> ff29918f86f43ebc
> [23860.819147] RBP: ff95ae8744fd8000 R08: 0000000000000000 R09:
> 0000000000100000
> [23860.819984] R10: 0000000000000000 R11: 000000000003ed8c R12:
> ff29918f86f43ebc
> [23860.820805] R13: ff95ae8744fd8000 R14: 0000000000000001 R15:
> 0000000000000000
> [23860.821602] FS:  00007f591e707740(0000) GS:ff299192afb00000(0000)
> knlGS:0000000000000000
> [23860.822464] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [23860.823030] CR2: 00007f6f6d2f1050 CR3: 0000000111ba0006 CR4:
> 0000000000773ef0
> [23860.823708] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [23860.824389] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [23860.825107] PKRU: 55555554
> [23860.825406] Call Trace:
> [23860.825721]  <IRQ>
> [23860.825996]  ? watchdog_timer_fn+0x1e0/0x260
> [23860.826434]  ? __pfx_watchdog_timer_fn+0x10/0x10
> [23860.826902]  ? __hrtimer_run_queues+0x113/0x280
> [23860.827362]  ? ktime_get+0x3e/0xf0
> [23860.827781]  ? hrtimer_interrupt+0xfa/0x230
> [23860.828283]  ? __sysvec_apic_timer_interrupt+0x55/0x120
> [23860.828861]  ? sysvec_apic_timer_interrupt+0x6c/0x90
> [23860.829356]  </IRQ>
> [23860.829591]  <TASK>
> [23860.829827]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [23860.830428]  ? _raw_spin_lock+0x1b/0x30
> [23860.830842]  nfs_folio_find_head_request+0x29/0x90 [nfs]
> [23860.831398]  nfs_lock_and_join_requests+0x64/0x270 [nfs]
> [23860.831953]  nfs_page_async_flush+0x1b/0x1f0 [nfs]
> [23860.832447]  nfs_wb_folio+0x1a4/0x290 [nfs]
> [23860.832903]  nfs_release_folio+0x62/0xb0 [nfs]
> [23860.833376]  split_huge_page_to_list_to_order+0x453/0x1140
> [23860.833930]  split_huge_pages_write+0x601/0xb30
> [23860.834421]  ? syscall_exit_to_user_mode+0x10/0x210
> [23860.835000]  ? do_syscall_64+0x8e/0x160
> [23860.835399]  ? inode_security+0x22/0x60
> [23860.835810]  full_proxy_write+0x54/0x80
> [23860.836211]  vfs_write+0xf8/0x450
> [23860.836560]  ? __x64_sys_fcntl+0x9b/0xe0
> [23860.837023]  ? syscall_exit_to_user_mode+0x10/0x210
> [23860.837589]  ksys_write+0x6f/0xf0
> [23860.837950]  do_syscall_64+0x82/0x160
> [23860.838396]  ? __x64_sys_fcntl+0x9b/0xe0
> [23860.838871]  ? syscall_exit_to_user_mode+0x10/0x210
> [23860.839433]  ? do_syscall_64+0x8e/0x160
> [23860.839910]  ? syscall_exit_to_user_mode+0x10/0x210
> [23860.840481]  ? do_syscall_64+0x8e/0x160
> [23860.840950]  ? get_close_on_exec+0x34/0x40
> [23860.841363]  ? do_fcntl+0x2d0/0x730
> [23860.841727]  ? __x64_sys_fcntl+0x9b/0xe0
> [23860.842168]  ? syscall_exit_to_user_mode+0x10/0x210
> [23860.842684]  ? do_syscall_64+0x8e/0x160
> [23860.843084]  ? clear_bhb_loop+0x25/0x80
> [23860.843490]  ? clear_bhb_loop+0x25/0x80
> [23860.843897]  ? clear_bhb_loop+0x25/0x80
> [23860.844271]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [23860.844845] RIP: 0033:0x7f591e812f64
> [23860.845242] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f
> 84 00 00 00 00 00 f3 0f 1e fa 80 3d 05 74 0d 00 00 74 13 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec
> 20 48 89
> [23860.847175] RSP: 002b:00007ffc19051998 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000001
> [23860.847923] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
> 00007f591e812f64
> [23860.848702] RDX: 0000000000000002 RSI: 0000562caf357b00 RDI:
> 0000000000000001
> [23860.849402] RBP: 00007ffc190519c0 R08: 0000000000000073 R09:
> 0000000000000001
> [23860.850081] R10: 0000000000000000 R11: 0000000000000202 R12:
> 0000000000000002
> [23860.850794] R13: 0000562caf357b00 R14: 00007f591e8e35c0 R15:
> 00007f591e8e0f00
> [23860.851592]  </TASK>
> 

I suspect it might be commit b571cfcb9dca ("nfs: don't reuse partially
completed requests in nfs_lock_and_join_requests").
That patch appears to assume that if one request is complete, then the
others will complete too before unlocking. I don't think that is a
valid assumption, since other requests could hit a non-fatal error or a
short write that would cause them not to complete.

So can you please try reverting only that patch and seeing if that
fixes the problem?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux