Backporting NFSD filecache improvements to longterm maintenance kernel release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Context: We are managing many Linux file servers running nfsd that support large aggregate throughput (up to 10GB/s) and reasonably high iops (up to ~1million) on LTS kernel 5.10. 

For some time now we have seen cpu lockups during nfsd laundromat cleanup/gc/shutdown and generally poor performance when there is a 'large' number of open nfsv4 files. After some investigation, it's easy to find that these issues were addressed by the "Overhaul NFSD failcache" patch series (https://lore.kernel.org/linux-nfs/165730437087.28142.6731645688073512500.stgit@xxxxxxxxxxxxxxxxxxxxx/) that can currently be found in kernel 6.

Another impact this issue has on stability that I have not seen mentioned in other submissions is its impact on nfs-server shutdown. The time it takes to shutdown nfsd scales non-linearly with the number of open nfsv4 files. Once I pass the >75,000 open nfsv4 file mark I'm practically guaranteed to trip the 'soft lockup' watchdog threshold during shutdown on my machine.

Shutdown Times:
# 50,000 Open Files: ~13 seconds
Feb 13 14:23:51 ip-198-19-24-243.ec2.internal systemd[1]: Stopping NFS server and services...
Feb 13 14:24:04 ip-198-19-24-243.ec2.internal systemd[1]: Stopped NFS server and services.
# 75,000 Open Files: 31 seconds
Feb 13 14:43:47 ip-198-19-24-243.ec2.internal systemd[1]: Stopping NFS server and services...
Feb 13 14:44:18 ip-198-19-24-243.ec2.internal systemd[1]: Stopped NFS server and services.
# 100,000 Open Files: 55 seconds
Feb 13 13:47:39 ip-198-19-24-243.ec2.internal systemd[1]: Stopping NFS server and services...
Feb 13 13:48:34 ip-198-19-24-243.ec2.internal systemd[1]: Stopped NFS server and services.
# 125,000 Open Files: 89 seconds
Feb 13 15:01:13 ip-198-19-24-243.ec2.internal systemd[1]: Stopping NFS server and services...
Feb 13 15:02:42 ip-198-19-24-243.ec2.internal systemd[1]: Stopped NFS server and services.

Cpu lockup message:
Feb 13 13:48:31 ip-198-19-24-243 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [nfsd:31932]

Stack trace during cpu lockup:
Feb 13 13:48:31 ip-198-19-24-243 kernel: RIP: 0010:__list_lru_walk_one+0xa8/0x150
Feb 13 13:48:31 ip-198-19-24-243 kernel: Code: 85 c0 0f 84 b0 00 00 00 41 8b 04 24 85 c0 0f 84 b5 00 00 00 48 83 44 24 10 01 49 83 6c 24 28 01 eb a5 83 f8 03 75 2f 48 8b 03 <49> 89 dd 49 39 df 74 0c 48 89 c3 48 8b 45 00 48 85 c0 75 9e 48 8b
Feb 13 13:48:31 ip-198-19-24-243 kernel: RSP: 0018:ffffb76c46e7fbf8 EFLAGS: 00000246
Feb 13 13:48:31 ip-198-19-24-243 kernel: RAX: ffff98f4e7e8aef0 RBX: ffff98f4e7b52c50 RCX: ffffb76c46e7fc98
Feb 13 13:48:31 ip-198-19-24-243 kernel: RDX: ffff98f47cc5ff40 RSI: ffff98f47cc5ff48 RDI: ffff98f47cc5ff48
Feb 13 13:48:31 ip-198-19-24-243 kernel: RBP: ffffb76c46e7fc90 R08: ffff98f4e7b527f0 R09: ffffb76c46e7fc98
Feb 13 13:48:31 ip-198-19-24-243 kernel: R10: 0000000000000001 R11: 0000000000038400 R12: ffff98f47cc5ff40
Feb 13 13:48:31 ip-198-19-24-243 kernel: R13: ffff98f4e7b527f0 R14: ffffb76c46e7fc98 R15: ffff98f47cc5ff48
Feb 13 13:48:31 ip-198-19-24-243 kernel: FS: 0000000000000000(0000) GS:ffff98f52ee00000(0000) knlGS:0000000000000000
Feb 13 13:48:31 ip-198-19-24-243 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 13 13:48:31 ip-198-19-24-243 kernel: CR2: 00007fddf92fafcc CR3: 000000010b34c004 CR4: 00000000007706f0
Feb 13 13:48:31 ip-198-19-24-243 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 13 13:48:31 ip-198-19-24-243 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 13 13:48:31 ip-198-19-24-243 kernel: PKRU: 55555554
Feb 13 13:48:31 ip-198-19-24-243 kernel: Call Trace:
Feb 13 13:48:31 ip-198-19-24-243 kernel: <IRQ>
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? show_trace_log_lvl+0x1c1/0x2d9
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? show_trace_log_lvl+0x1c1/0x2d9
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? list_lru_walk_node+0x56/0xe0
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? lockup_detector_update_enable+0x50/0x50
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? watchdog_timer_fn+0x1bb/0x210
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? __run_hrtimer+0x5c/0x190
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? __hrtimer_run_queues+0x86/0xe0
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? hrtimer_interrupt+0x110/0x2c0
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? __sysvec_apic_timer_interrupt+0x5c/0xe0
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? asm_call_irq_on_stack+0xf/0x20
Feb 13 13:48:31 ip-198-19-24-243 kernel: </IRQ>
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? sysvec_apic_timer_interrupt+0x72/0x80
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? asm_sysvec_apic_timer_interrupt+0x12/0x20
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? __list_lru_walk_one+0xa8/0x150
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? __list_lru_walk_one+0x74/0x150
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? nfsd_file_lru_count+0xa0/0xa0 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? nfsd_file_lru_count+0xa0/0xa0 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: list_lru_walk_node+0x56/0xe0
Feb 13 13:48:31 ip-198-19-24-243 kernel: nfsd_file_lru_walk_list+0x168/0x190 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: release_all_access+0x6a/0x80 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: nfs4_free_ol_stateid+0x22/0x60 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: free_ol_stateid_reaplist+0x59/0xa0 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: release_openowner+0x178/0x1b0 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: __destroy_client+0x157/0x230 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: nfs4_state_destroy_net+0x82/0x190 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: nfs4_state_shutdown_net+0x129/0x160 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: nfsd_last_thread+0x102/0x130 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: nfsd_destroy+0x3c/0x60 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: nfsd+0x126/0x140 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? nfsd_shutdown_threads+0x80/0x80 [nfsd]
Feb 13 13:48:31 ip-198-19-24-243 kernel: kthread+0x118/0x140
Feb 13 13:48:31 ip-198-19-24-243 kernel: ? __kthread_bind_mask+0x60/0x60
Feb 13 13:48:31 ip-198-19-24-243 kernel: ret_from_fork+0x1f/0x30
***

Before we spend more time investigating, I first thought I'd ask if the maintainers would be open to reviewing a set of patches that backport the NFSD filecache improvements to LTS kernel 5.10. From my perspective, these patches are core to nfsd being performant and stable with nfsv4. The changes included in the original patch series are large, but from what I can tell have been relatively bug free since being introduced to the mainline.

I believe we would not be the only ones who would benefit if these changes were backported to a 5.x LTS kernel. It appears others have attempted to backport some of these changes to their own 5.x kernels (see https://marc.info/?l=linux-kernel&m=167286008910652&w=2, https://marc.info/?l=linux-nfs&m=169269659416487&w=2). Both of these submissions indicate that they encountered some issues after they backported, the latter of which mentioned a later patch resolved (https://marc.info/?l=linux-nfs&m=167293078213110&w=2). However, I'm unsure if this later patch is needed since LTS kernel 6.1 is still without this commit. The above two examples provide some hesitation on our side for backporting these changes without some assistance/guidance. 

Also, a mandatory thank you to Chuck Lever and others for implementing these filecache improvements in the first place.

Regards,
Daniel Perry






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux