Re: still seeing single client NFS4ERR_DELAY / CB_RECALL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Aug 18, 2020, at 5:49 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> 
> On Tue, Aug 18, 2020 at 05:26:26PM -0400, Chuck Lever wrote:
>> 
>>> On Aug 17, 2020, at 6:20 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>>> 
>>> On Sun, Aug 16, 2020 at 04:46:00PM -0400, Chuck Lever wrote:
>>> 
>>>> In order of application:
>>>> 
>>>> 5920afa3c85f ("nfsd: hook nfsd_commit up to the nfsd_file cache")
>>>> 961.68user 5252.40system 20:12.30elapsed 512%CPU, 2541 DELAY errors
>>>> These results are similar to v5.3.
>>>> 
>>>> fd4f83fd7dfb ("nfsd: convert nfs4_file->fi_fds array to use nfsd_files")
>>>> Does not build
>>>> 
>>>> eb82dd393744 ("nfsd: convert fi_deleg_file and ls_file fields to nfsd_file")
>>>> 966.92user 5425.47system 33:52.79elapsed 314%CPU, 1330 DELAY errors
>>>> 
>>>> Can you take a look and see if there's anything obvious?
>>> 
>>> Unfortunately nothing about the file cache code is very obvious to me.
>>> I'm looking at it....
>>> 
>>> It adds some new nfserr_jukebox returns in nfsd_file_acquire.  Those
>>> mostly look like kmalloc failures, the one I'm not sure about is the
>>> NFSD_FILE_HASHED check.
>>> 
>>> Or maybe it's the lease break there.
>> 
>> nfsd_file_acquire() always calls fh_verify() before it invokes nfsd_open().
>> Replacing nfs4_get_vfs_file's nfsd_open() call with nfsd_file_acquire() adds
>> almost 10 million fh_verify() calls to my test run.

More context for this number is in the raw data:

Before: 15,742,399 calls to fh_verify() on 5,575,986 RPCs or 2.8 per RPC
After: 24,857,521 calls to fh_verify() on 7,684,320 RPCs or 3.2 per PRC

That commit results in more RPCs and more fh_verify calls per RPC. Not
a benign change for NFSv4.0.


>> On my server, fh_verify() is quite expensive. Most of the cost is in the
>> prepare_creds() call.
> 
> Huh, interesting.
> 
> So you no longer think there's a difference in NFS4ERR_DELAY returns
> before and after?

There's a difference. With the bad commit, the number of DELAY errors drops
by half.

Before: 2541 DELAY errors
After: 1330 DELAY errors

However, sometime after the bad commit, the number of DELAY errors during
the test goes back up to about 2700, but the elapsed time doesn't change.
The data suggests that the count of DELAY errors does not impact the
overall throughput of the test.


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux