>From: Chuck Lever III <chuck.lever@xxxxxxxxxx> >> On Feb 8, 2024, at 3:26 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote: >> >> We just turned delegations on for two big NFS servers. One characteristic >> of our site is that we have lots of small files and lots of files open. >> >> On one server, CPU in system state went to 30%, and NFS performance ground >> to a halt. When I disabled delegations it came back. The other server was >> showing high CPU on nfsd, but not enough to disable the server, so I looked >> around. The server where delegations are still on is spending most of its time >> in nfsd_file_lru_cb. That's not the case with the server where we've disabled >> delegations. Here's a typical perf top >> >> Overhead Shared Object Symbol >> 44.87% [kernel] [k] __list_lru_walk_one >> 13.18% [kernel] [k] native_queued_spin_lock_slowpath.part.0 >> 7.24% [kernel] [k] nfsd_file_lru_cb >> 2.61% [kernel] [k] sha1_transform >> 0.99% [kernel] [k] __crypto_alg_lookup >> 0.95% [kernel] [k] _raw_spin_lock >> 0.89% [kernel] [k] memcpy_erms >> 0.77% [kernel] [k] mutex_lock >> 0.65% [kernel] [k] svc_tcp_recvfrom >> >> I looked at the code. I'm not clear whether there's a problem with GC'ing the >>entries, or it's just being called too often (maybe a table is too small?) >> >> When I disabled delegations, it immediately stopped spending all that time >> in nfsd_file_lru_cb. The number of delegations starting going down slowly. >> I suspect our system needs a lot more delegations than the maximum table >> size, and it's thrashing. The sizes were about 40,000 and >> 60,000 on the two machines. Systems are 384 G and 768 G, respectively. >> The maximum number of delegations is smaller than I would have expected >> based on comments in the code. >When reporting such problems, please include the kernel version >on your NFS servers. Some late 5.x kernels have known problems > with the NFSD file cache. My apologies.Ubuntu 5.15.0-91-generic , which is 5.15.131.