Re: apparent scaling problem with delegations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 17, 2024 at 12:31:34AM +0100, Dan Shelton wrote:
> On Thu, 8 Feb 2024 at 23:06, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> >
> >
> >
> > > On Feb 8, 2024, at 4:45 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote:
> > >
> > >> From: Chuck Lever III <chuck.lever@xxxxxxxxxx>
> > >>> On Feb 8, 2024, at 3:26 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote:
> > >>>
> > >>> We just turned delegations on for two big NFS servers. One characteristic
> > >>> of our site is that we have lots of small files and lots of files open.
> > >>>
> > >>> On one server, CPU in system state went to 30%, and NFS performance ground
> > >>> to a halt. When I disabled delegations it came back. The other server was
> > >>> showing high CPU on nfsd, but not enough to disable the server, so I looked
> > >>> around. The server where delegations are still on is spending most of its time
> > >>> in nfsd_file_lru_cb. That's not the case with the server where we've disabled
> > >>> delegations. Here's a typical perf top
> > >>>
> > >>> Overhead  Shared Object                                 Symbol
> > >>>    44.87%  [kernel]                                      [k] __list_lru_walk_one
> > >>>    13.18%  [kernel]                                      [k] native_queued_spin_lock_slowpath.part.0
> > >>>     7.24%  [kernel]                                      [k] nfsd_file_lru_cb
> > >>>     2.61%  [kernel]                                      [k] sha1_transform
> > >>>     0.99%  [kernel]                                      [k] __crypto_alg_lookup
> > >>>     0.95%  [kernel]                                      [k] _raw_spin_lock
> > >>>     0.89%  [kernel]                                      [k] memcpy_erms
> > >>>     0.77%  [kernel]                                      [k] mutex_lock
> > >>>     0.65%  [kernel]                                      [k] svc_tcp_recvfrom
> > >>>
> > >>> I looked at the code. I'm not clear whether there's a problem with GC'ing the
> > >>> entries, or it's just being called too often (maybe a table is too small?)
> > >>>
> > >>> When I disabled delegations, it immediately stopped spending all that time
> > >>> in nfsd_file_lru_cb. The number of delegations starting going down slowly.
> > >>> I suspect our system needs a lot more delegations than the maximum table
> > >>> size, and it's thrashing. The sizes were about 40,000 and
> > >>> 60,000 on the two machines.  Systems are 384 G and 768 G, respectively.
> > >>> The maximum number of delegations is smaller than I would have expected
> > >>> based on comments in the code.
> > >
> > >> When reporting such problems, please include the kernel version
> > >> on your NFS servers. Some late 5.x kernels have known problems
> > >> with the NFSD file cache.
> > >
> > > My apologies.Ubuntu 5.15.0-91-generic , which is 5.15.131.
> >
> > That kernel is likely to have file cache issues with symptoms
> > very much as you described above. The issues are thought to
> > be addressed by kernel release v6.2.
> 
> Is there a way to turn the file cache off for nfsd?

It is integrated into the operation of NFSD, so it cannot be
disabled.


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux