Re: apparent scaling problem with delegations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 8 Feb 2024 at 23:06, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>
>
>
> > On Feb 8, 2024, at 4:45 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote:
> >
> >> From: Chuck Lever III <chuck.lever@xxxxxxxxxx>
> >>> On Feb 8, 2024, at 3:26 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote:
> >>>
> >>> We just turned delegations on for two big NFS servers. One characteristic
> >>> of our site is that we have lots of small files and lots of files open.
> >>>
> >>> On one server, CPU in system state went to 30%, and NFS performance ground
> >>> to a halt. When I disabled delegations it came back. The other server was
> >>> showing high CPU on nfsd, but not enough to disable the server, so I looked
> >>> around. The server where delegations are still on is spending most of its time
> >>> in nfsd_file_lru_cb. That's not the case with the server where we've disabled
> >>> delegations. Here's a typical perf top
> >>>
> >>> Overhead  Shared Object                                 Symbol
> >>>    44.87%  [kernel]                                      [k] __list_lru_walk_one
> >>>    13.18%  [kernel]                                      [k] native_queued_spin_lock_slowpath.part.0
> >>>     7.24%  [kernel]                                      [k] nfsd_file_lru_cb
> >>>     2.61%  [kernel]                                      [k] sha1_transform
> >>>     0.99%  [kernel]                                      [k] __crypto_alg_lookup
> >>>     0.95%  [kernel]                                      [k] _raw_spin_lock
> >>>     0.89%  [kernel]                                      [k] memcpy_erms
> >>>     0.77%  [kernel]                                      [k] mutex_lock
> >>>     0.65%  [kernel]                                      [k] svc_tcp_recvfrom
> >>>
> >>> I looked at the code. I'm not clear whether there's a problem with GC'ing the
> >>> entries, or it's just being called too often (maybe a table is too small?)
> >>>
> >>> When I disabled delegations, it immediately stopped spending all that time
> >>> in nfsd_file_lru_cb. The number of delegations starting going down slowly.
> >>> I suspect our system needs a lot more delegations than the maximum table
> >>> size, and it's thrashing. The sizes were about 40,000 and
> >>> 60,000 on the two machines.  Systems are 384 G and 768 G, respectively.
> >>> The maximum number of delegations is smaller than I would have expected
> >>> based on comments in the code.
> >
> >> When reporting such problems, please include the kernel version
> >> on your NFS servers. Some late 5.x kernels have known problems
> >> with the NFSD file cache.
> >
> > My apologies.Ubuntu 5.15.0-91-generic , which is 5.15.131.
>
> That kernel is likely to have file cache issues with symptoms
> very much as you described above. The issues are thought to
> be addressed by kernel release v6.2.

Is there a way to turn the file cache off for nfsd?

Dan
-- 
Dan Shelton - Cluster Specialist Win/Lin/Bsd





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux