On Sat, Feb 17, 2024 at 12:31:34AM +0100, Dan Shelton wrote: > On Thu, 8 Feb 2024 at 23:06, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > > > > > > > > On Feb 8, 2024, at 4:45 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote: > > > > > >> From: Chuck Lever III <chuck.lever@xxxxxxxxxx> > > >>> On Feb 8, 2024, at 3:26 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote: > > >>> > > >>> We just turned delegations on for two big NFS servers. One characteristic > > >>> of our site is that we have lots of small files and lots of files open. > > >>> > > >>> On one server, CPU in system state went to 30%, and NFS performance ground > > >>> to a halt. When I disabled delegations it came back. The other server was > > >>> showing high CPU on nfsd, but not enough to disable the server, so I looked > > >>> around. The server where delegations are still on is spending most of its time > > >>> in nfsd_file_lru_cb. That's not the case with the server where we've disabled > > >>> delegations. Here's a typical perf top > > >>> > > >>> Overhead Shared Object Symbol > > >>> 44.87% [kernel] [k] __list_lru_walk_one > > >>> 13.18% [kernel] [k] native_queued_spin_lock_slowpath.part.0 > > >>> 7.24% [kernel] [k] nfsd_file_lru_cb > > >>> 2.61% [kernel] [k] sha1_transform > > >>> 0.99% [kernel] [k] __crypto_alg_lookup > > >>> 0.95% [kernel] [k] _raw_spin_lock > > >>> 0.89% [kernel] [k] memcpy_erms > > >>> 0.77% [kernel] [k] mutex_lock > > >>> 0.65% [kernel] [k] svc_tcp_recvfrom > > >>> > > >>> I looked at the code. I'm not clear whether there's a problem with GC'ing the > > >>> entries, or it's just being called too often (maybe a table is too small?) > > >>> > > >>> When I disabled delegations, it immediately stopped spending all that time > > >>> in nfsd_file_lru_cb. The number of delegations starting going down slowly. > > >>> I suspect our system needs a lot more delegations than the maximum table > > >>> size, and it's thrashing. The sizes were about 40,000 and > > >>> 60,000 on the two machines. Systems are 384 G and 768 G, respectively. > > >>> The maximum number of delegations is smaller than I would have expected > > >>> based on comments in the code. > > > > > >> When reporting such problems, please include the kernel version > > >> on your NFS servers. Some late 5.x kernels have known problems > > >> with the NFSD file cache. > > > > > > My apologies.Ubuntu 5.15.0-91-generic , which is 5.15.131. > > > > That kernel is likely to have file cache issues with symptoms > > very much as you described above. The issues are thought to > > be addressed by kernel release v6.2. > > Is there a way to turn the file cache off for nfsd? It is integrated into the operation of NFSD, so it cannot be disabled. -- Chuck Lever