Re: 50% regression in NFS direct WRITE throughput

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On May 29, 2020, at 9:02 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> 
> While testing other things, I noticed that several iozone tests showed
> a significant regression in large direct WRITE performance with little
> to no drop in small WRITE IOPS.
> 
> One example (NFS/RDMA on FDR InfiniBand):
> 
> 	Machine = Linux manet.1015granger.net 5.7.0-rc7-00033-g8de6ca0614d4 #1071 SMP
> 	CPU utilization Resolution = 0.000 seconds.
> 	CPU utilization Excel chart enabled
> 	File size set to 1048576 kB
> 	Record Size 256 kB
> 	O_DIRECT feature enabled
> 	Command line used: /home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r256k -t12 -I
> 	Output is in kBytes/sec
> 	Time Resolution = 0.000001 seconds.
> 	Processor cache size set to 1024 kBytes.
> 	Processor cache line size set to 32 bytes.
> 	File stride size set to 17 * record size.
> 	Throughput test with 12 processes
> 	Each process writes a 1048576 kByte file in 256 kByte records
> 
> 	Children see throughput for 12 initial writers 	= 2430898.66 kB/sec
> 	Parent sees throughput for 12 initial writers 	= 2425731.85 kB/sec
> 	Min throughput per process 			=  202025.03 kB/sec
> 	Max throughput per process 			=  202899.33 kB/sec
> 	Avg throughput per process 			=  202574.89 kB/sec
> 	Min xfer 					= 1044224.00 kB
> 	CPU Utilization: Wall time    5.179    CPU time    2.020    CPU utilization  39.00 %
> 
> 	Children see throughput for 12 rewriters 	= 2431774.06 kB/sec
> 	Parent sees throughput for 12 rewriters 	= 2431230.83 kB/sec
> 	Min throughput per process 			=  202230.42 kB/sec
> 	Max throughput per process 			=  202926.08 kB/sec
> 	Avg throughput per process 			=  202647.84 kB/sec
> 	Min xfer 					= 1045248.00 kB
> 	CPU utilization: Wall time    5.169    CPU time    2.015    CPU utilization  38.99 %
> 
> These numbers are half what they usually are.
> 
> I bisected between v5.6 and v5.7-rc7, and it terminated on 1f28476dcb98
> ("NFS: Fix O_DIRECT commit verifier handling").
> 
> This commit doesn't revert cleanly -- the kernel won't build after it is
> reverted, so I can't easily do the obvious test to confirm the bisect
> result.
> 
> I intend to look into the exact pathology, but wanted to get this regression
> reported first, in case someone has a thought about what is slowing things
> down.

The observed behavior is that the client sends every WRITE twice: once as
an UNSTABLE WRITE plus a COMMIT, and once as a FILE_SYNC WRITE.

This is because the nfs_write_match_verf() check in nfs_direct_commit_complete()
fails for every on-the-wire WRITE.

Buffered writes use nfs_write_completion(), which sets req->wb_verf correctly.

Direct writes use nfs_direct_write_completion(), which does not set req->wb_verf
at all. This leaves req->wb_verf set to all zeroes for every direct WRITE,
and thus nfs_direct_commit_completion always requests a resend.

I confirmed all this by adding temporary tracepoints in the write completion
paths. Seems like the fix is to duplicate the guts of nfs_write_completion() in
nfs_direct_write_completion() (or refactor the guts into helpers that both
functions invoke).


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux