Re: apparent scaling problem with delegations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>From: Chuck Lever III <chuck.lever@xxxxxxxxxx>
>> On Feb 8, 2024, at 3:26 PM, Charles Hedrick <hedrick@xxxxxxxxxxx> wrote:
>>
>> We just turned delegations on for two big NFS servers. One characteristic 
>> of our site is that we have lots of small files and lots of files open.
>>
>> On one server, CPU in system state went to 30%, and NFS performance ground 
>> to a halt. When I disabled delegations it came back. The other server was 
>> showing high CPU on nfsd, but not enough to disable the server, so I looked 
>> around. The server where delegations are still on is spending most of its time 
>> in nfsd_file_lru_cb. That's not the case with the server where we've disabled
>> delegations. Here's a typical perf top
>>
>> Overhead  Shared Object                                 Symbol
>>   44.87%  [kernel]                                      [k] __list_lru_walk_one
>>   13.18%  [kernel]                                      [k] native_queued_spin_lock_slowpath.part.0 
>>    7.24%  [kernel]                                      [k] nfsd_file_lru_cb
>>    2.61%  [kernel]                                      [k] sha1_transform
>>    0.99%  [kernel]                                      [k] __crypto_alg_lookup
>>    0.95%  [kernel]                                      [k] _raw_spin_lock
>>    0.89%  [kernel]                                      [k] memcpy_erms
>>    0.77%  [kernel]                                      [k] mutex_lock 
>>    0.65%  [kernel]                                      [k] svc_tcp_recvfrom   
>>
>> I looked at the code. I'm not clear whether there's a problem with GC'ing the 
>>entries, or it's just being called too often (maybe a table is too small?)
>>
>> When I disabled delegations, it immediately stopped spending all that time 
>> in nfsd_file_lru_cb. The number of delegations starting going down slowly. 
>> I suspect our system needs a lot more delegations than the maximum table 
>> size, and it's thrashing. The sizes were about 40,000 and
>> 60,000 on the two machines.  Systems are 384 G and 768 G, respectively. 
>> The maximum number of delegations is smaller than I would have expected
>> based on comments in the code.

>When reporting such problems, please include the kernel version
>on your NFS servers. Some late 5.x kernels have known problems
> with the NFSD file cache.

My apologies.Ubuntu 5.15.0-91-generic , which is 5.15.131.




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux