Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/30/21 7:32 AM, Bruce Fields wrote:
On Mon, Nov 29, 2021 at 11:13:34PM -0800, dai.ngo@xxxxxxxxxx wrote:
Just to be clear, the problem of pynfs with 4.0 is that the server takes
~55 secs to expire 1026 4.0 courteous clients, which comes out to ~50ms
per client. This causes the test to time out in waiting for RPC reply of
the OPEN that triggers the conflicts.

I don't know exactly where the time spent in the process of expiring a
client. But as Bruce mentioned, it could be related to the time to access
/var/lib/nfs to remove the client's persistent record.
Could you try something like

	strace -r -$(pidof) -oTRACE

Strace does not have any info that shows where the server spent time when
expiring client state. The client record is removed by nfsd4_umh_cltrack_remove
doing upcall to user space helper /sbin/nfsdcltrack to do the job. I used
the low-tech debug tool, printk, to measure the time spent by
nfsd4_client_record_remove. Here is a sample of the output, START and END
are in milliseconds:

Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d418] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0]
Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d459] clp[ffff888119206040] client_tracking_ops[ffffffffa04bc2e0]
Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d461] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0]
Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d48e] clp[ffff888119206740] client_tracking_ops[ffffffffa04bc2e0]
Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: START [0x15d49c] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0]
Nov 30 12:31:04 localhost kernel: nfsd4_client_record_remove: END [0x15d4c5] clp[ffff88811b54e000] client_tracking_ops[ffffffffa04bc2e0]

The average time spent to remove the client record is about ~50ms, matches
with the time reported by pynfs test. This confirms what Bruce suspected
earlier.

-Dai


and maybe we could take a look at TRACE?  My hope would be that there'd
be a clear set of syscalls whose time, multiplied by 1026, explains most
of that 55 seconds.  Then it might be worth checking whether there are
any easy optimizations possible.

--b.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux