On Thu, 8 Feb 2024 at 23:06, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > > > > On Feb 8, 2024, at 4:45 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote: > > > >> From: Chuck Lever III <chuck.lever@xxxxxxxxxx> > >>> On Feb 8, 2024, at 3:26 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote: > >>> > >>> We just turned delegations on for two big NFS servers. One characteristic > >>> of our site is that we have lots of small files and lots of files open. > >>> > >>> On one server, CPU in system state went to 30%, and NFS performance ground > >>> to a halt. When I disabled delegations it came back. The other server was > >>> showing high CPU on nfsd, but not enough to disable the server, so I looked > >>> around. The server where delegations are still on is spending most of its time > >>> in nfsd_file_lru_cb. That's not the case with the server where we've disabled > >>> delegations. Here's a typical perf top > >>> > >>> Overhead Shared Object Symbol > >>> 44.87% [kernel] [k] __list_lru_walk_one > >>> 13.18% [kernel] [k] native_queued_spin_lock_slowpath.part.0 > >>> 7.24% [kernel] [k] nfsd_file_lru_cb > >>> 2.61% [kernel] [k] sha1_transform > >>> 0.99% [kernel] [k] __crypto_alg_lookup > >>> 0.95% [kernel] [k] _raw_spin_lock > >>> 0.89% [kernel] [k] memcpy_erms > >>> 0.77% [kernel] [k] mutex_lock > >>> 0.65% [kernel] [k] svc_tcp_recvfrom > >>> > >>> I looked at the code. I'm not clear whether there's a problem with GC'ing the > >>> entries, or it's just being called too often (maybe a table is too small?) > >>> > >>> When I disabled delegations, it immediately stopped spending all that time > >>> in nfsd_file_lru_cb. The number of delegations starting going down slowly. > >>> I suspect our system needs a lot more delegations than the maximum table > >>> size, and it's thrashing. The sizes were about 40,000 and > >>> 60,000 on the two machines. Systems are 384 G and 768 G, respectively. > >>> The maximum number of delegations is smaller than I would have expected > >>> based on comments in the code. > > > >> When reporting such problems, please include the kernel version > >> on your NFS servers. Some late 5.x kernels have known problems > >> with the NFSD file cache. > > > > My apologies.Ubuntu 5.15.0-91-generic , which is 5.15.131. > > That kernel is likely to have file cache issues with symptoms > very much as you described above. The issues are thought to > be addressed by kernel release v6.2. Is there a way to turn the file cache off for nfsd? Dan -- Dan Shelton - Cluster Specialist Win/Lin/Bsd