Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/02/18 08:29, Terry Barnaby wrote:
On 01/02/18 01:34, Jeremy Linton wrote:
On 01/31/2018 09:49 AM, J. Bruce Fields wrote:
On Tue, Jan 30, 2018 at 01:52:49PM -0600, Jeremy Linton wrote:
Have you tried this with a '-o nfsvers=3' during mount? Did that help?

I noticed a large decrease in my kernel build times across NFS/lan a while back after a machine/kernel/10g upgrade. After playing with mount/export options filesystem tuning/etc, I got to this point of timing a bunch of these operations vs the older machine, at which point I discovered that
simply backing down to NFSv3 solved the problem.

AKA a nfsv3 server on a 10 year old 4 disk xfs RAID5 on 1Gb ethernet, was slower than a modern machine with a 8 disk xfs RAID5 on 10Gb on nfsv4. The effect was enough to change a kernel build from ~45 minutes down to less
than 5.

Using NFSv3 in async mode is faster than NFSv4 in async mode (still abysmal in sync mode).

NFSv3 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; sync)

real    2m25.717s
user    0m8.739s
sys     0m13.362s

NFSv4 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp; sync)

real    3m33.032s
user    0m8.506s
sys     0m16.930s

NFSv3 async: wireshark trace

No.     Time           Source Destination           Protocol Length Info
  18527 2.815884979    192.168.202.2         192.168.202.1 NFS      216    V3 CREATE Call (Reply In 18528), DH: 0x62f39428/dma.h Mode: EXCLUSIVE   18528 2.816362338    192.168.202.1         192.168.202.2 NFS      328    V3 CREATE Reply (Call In 18527)   18529 2.816418841    192.168.202.2         192.168.202.1 NFS      224    V3 SETATTR Call (Reply In 18530), FH: 0x13678ba0   18530 2.816871820    192.168.202.1         192.168.202.2 NFS      216    V3 SETATTR Reply (Call In 18529)   18531 2.816966771    192.168.202.2         192.168.202.1 NFS      1148   V3 WRITE Call (Reply In 18532), FH: 0x13678ba0 Offset: 0 Len: 934 FILE_SYNC   18532 2.817441291    192.168.202.1         192.168.202.2 NFS      208    V3 WRITE Reply (Call In 18531) Len: 934 FILE_SYNC   18533 2.817495775    192.168.202.2         192.168.202.1 NFS      236    V3 SETATTR Call (Reply In 18534), FH: 0x13678ba0   18534 2.817920346    192.168.202.1         192.168.202.2 NFS      216    V3 SETATTR Reply (Call In 18533)   18535 2.818002910    192.168.202.2         192.168.202.1 NFS      216    V3 CREATE Call (Reply In 18536), DH: 0x62f39428/elf.h Mode: EXCLUSIVE   18536 2.818492126    192.168.202.1         192.168.202.2 NFS      328    V3 CREATE Reply (Call In 18535)

This is taking about 2ms for a small file write rather than 3ms for NFSv4. There is an extra GETATTR and CLOSE RPC in NFSv4 accounting for the difference.

So where I am:

1. NFS in sync mode, at least on my two Fedora27 systems for my usage is completely unusable. (sync: 2 hours, async: 3 minutes, localdisk: 13 seconds).

2. NFS async mode is working, but the small writes are still very slow.

3. NFS in async mode is 30% better with NFSv3 than NFSv4 when writing small files due to the increased latency caused by NFSv4's two extra RPC calls.

I really think that in 2018 we should be able to have better NFS performance when writing many small files such as used in software development. This would speed up any system that was using NFS with this sort of workload dramatically and reduce power usage all for some improvements in the NFS protocol.

I don't know the details of if this would work, or who is responsible for NFS, but it would be good if possible to have some improvements (NFSv4.3 ?). Maybe:

1. Have an OPEN-SETATTR-WRITE RPC call all in one and a SETATTR-CLOSE call all in one. This would reduce the latency of a small file to 1ms rather than 3ms thus 66% faster. Would require the client to delay the OPEN/SETATTR until the first WRITE. Not sure how possible this is in the implementations. Maybe READ's could be improved as well but getting the OPEN through quick may be better in this case ?

2. Could go further with an OPEN-SETATTR-WRITE-CLOSE RPC call. (0.5ms vs 3ms).

3. On sync/async modes personally I think it would be better for the client to request the mount in sync/async mode. The setting of sync on the server side would just enforce sync mode for all clients. If the server is in the default async mode clients can mount using sync or async as to their requirements. This seems to match normal VFS semantics and usage patterns better.

4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe this is worth investigating in the Linux kernel processing (how ?) ?

5. The 20ms RPC latency I see in sync mode needs a look at on my system although async mode is fine for my usage. Maybe this ends up as 2 x 10ms drive seeks on ext4 and is thus expected.

Yet another poor NFSv3 performance issue. If I do a "ls -lR" of a certain NFS mounted directory over a slow link (NFS over Openvpn over FTTP 80/20Mbps), just after mounting the file system (default NFSv4 mount with async), it takes about 9 seconds. If I run the same "ls -lR" again, just after, it takes about 60 seconds. So much for caching ! I have noticed Makefile based builds (over Ethernet 1Gbps) taking a long time with a second or so between each directory, I think this maybe why.

Listing the directory using a NFSv3 mount takes 67 seconds on the first mount and about the same on subsequent ones. No noticeable caching (default mount options with async), At least NFSv4 is fast the first time !

NFSv4 directory reads after mount:

No.     Time           Source Destination           Protocol Length Info
    667 4.560833210    192.168.202.2         192.168.201.1 NFS      304    V4 Call (Reply In 672) READDIR FH: 0xde55a546     668 4.582809439    192.168.201.1         192.168.202.2 TCP      1405   2049 → 679 [ACK] Seq=304477 Ack=45901 Win=1452 Len=1337 TSval=2646321616 TSecr=913651354 [TCP segment of a reassembled PDU]     669 4.582986377    192.168.201.1         192.168.202.2 TCP      1405   2049 → 679 [ACK] Seq=305814 Ack=45901 Win=1452 Len=1337 TSval=2646321616 TSecr=913651354 [TCP segment of a reassembled PDU]     670 4.583003805    192.168.202.2         192.168.201.1 TCP      68     679 → 2049 [ACK] Seq=45901 Ack=307151 Win=1444 Len=0 TSval=913651376 TSecr=2646321616     671 4.583265423    192.168.201.1         192.168.202.2 TCP      1405   2049 → 679 [ACK] Seq=307151 Ack=45901 Win=1452 Len=1337 TSval=2646321616 TSecr=913651354 [TCP segment of a reassembled PDU]     672 4.583280603    192.168.201.1         192.168.202.2 NFS      289    V4 Reply (Call In 667) READDIR     673 4.583291818    192.168.202.2         192.168.201.1 TCP      68     679 → 2049 [ACK] Seq=45901 Ack=308709 Win=1444 Len=0 TSval=913651377 TSecr=2646321616     674 4.583819172    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 675) GETATTR FH: 0xb91bfde7     675 4.605389953    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 674) GETATTR     676 4.605491075    192.168.202.2         192.168.201.1 NFS      288    V4 Call (Reply In 677) ACCESS FH: 0xb91bfde7, [Check: RD LU MD XT DL]     677 4.626848306    192.168.201.1         192.168.202.2 NFS      240    V4 Reply (Call In 676) ACCESS, [Allowed: RD LU MD XT DL]     678 4.626993773    192.168.202.2         192.168.201.1 NFS      304    V4 Call (Reply In 679) READDIR FH: 0xb91bfde7     679 4.649330354    192.168.201.1         192.168.202.2 NFS      2408   V4 Reply (Call In 678) READDIR     680 4.649380840    192.168.202.2         192.168.201.1 TCP      68     679 → 2049 [ACK] Seq=46569 Ack=311465 Win=1444 Len=0 TSval=913651443 TSecr=2646321683     681 4.649716746    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 682) GETATTR FH: 0xb6d01f2a     682 4.671167708    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 681) GETATTR     683 4.671281003    192.168.202.2         192.168.201.1 NFS      288    V4 Call (Reply In 684) ACCESS FH: 0xb6d01f2a, [Check: RD LU MD XT DL]     684 4.692647455    192.168.201.1         192.168.202.2 NFS      240    V4 Reply (Call In 683) ACCESS, [Allowed: RD LU MD XT DL]     685 4.692825251    192.168.202.2         192.168.201.1 NFS      304    V4 Call (Reply In 690) READDIR FH: 0xb6d01f2a     686 4.715060586    192.168.201.1         192.168.202.2 TCP      1405   2049 → 679 [ACK] Seq=311881 Ack=47237 Win=1452 Len=1337 TSval=2646321748 TSecr=913651486 [TCP segment of a reassembled PDU]     687 4.715199557    192.168.201.1         192.168.202.2 TCP      1405   2049 → 679 [ACK] Seq=313218 Ack=47237 Win=1452 Len=1337 TSval=2646321748 TSecr=913651486 [TCP segment of a reassembled PDU]     688 4.715215055    192.168.202.2         192.168.201.1 TCP      68     679 → 2049 [ACK] Seq=47237 Ack=314555 Win=1444 Len=0 TSval=913651509 TSecr=2646321748     689 4.715524465    192.168.201.1         192.168.202.2 TCP      1405   2049 → 679 [ACK] Seq=314555 Ack=47237 Win=1452 Len=1337 TSval=2646321749 TSecr=913651486 [TCP segment of a reassembled PDU]     690 4.715911571    192.168.201.1         192.168.202.2 NFS      1449   V4 Reply (Call In 685) READDIR

NFS directory reads later:

No.     Time           Source Destination           Protocol Length Info
    664 9.485593049    192.168.202.2         192.168.201.1 NFS      304    V4 Call (Reply In 669) READDIR FH: 0x1933e99e     665 9.507596250    192.168.201.1         192.168.202.2 TCP      1405   2049 → 788 [ACK] Seq=127921 Ack=65730 Win=3076 Len=1337 TSval=2645776572 TSecr=913106316 [TCP segment of a reassembled PDU]     666 9.507717425    192.168.201.1         192.168.202.2 TCP      1405   2049 → 788 [ACK] Seq=129258 Ack=65730 Win=3076 Len=1337 TSval=2645776572 TSecr=913106316 [TCP segment of a reassembled PDU]     667 9.507733352    192.168.202.2         192.168.201.1 TCP      68     788 → 2049 [ACK] Seq=65730 Ack=130595 Win=1444 Len=0 TSval=913106338 TSecr=2645776572     668 9.507987020    192.168.201.1         192.168.202.2 TCP      1405   2049 → 788 [ACK] Seq=130595 Ack=65730 Win=3076 Len=1337 TSval=2645776572 TSecr=913106316 [TCP segment of a reassembled PDU]     669 9.508456847    192.168.201.1         192.168.202.2 NFS      989    V4 Reply (Call In 664) READDIR     670 9.508472149    192.168.202.2         192.168.201.1 TCP      68     788 → 2049 [ACK] Seq=65730 Ack=132853 Win=1444 Len=0 TSval=913106338 TSecr=2645776572     671 9.508880627    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 672) GETATTR FH: 0x7e9e8300     672 9.530375865    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 671) GETATTR     673 9.530564317    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 674) GETATTR FH: 0xcb837ac9     674 9.551906321    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 673) GETATTR     675 9.552064038    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 676) GETATTR FH: 0xbf951d32     676 9.574210528    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 675) GETATTR     677 9.574334117    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 678) GETATTR FH: 0xd3f3dc3e     678 9.595902902    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 677) GETATTR     679 9.596025484    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 680) GETATTR FH: 0xf534332a     680 9.617497794    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 679) GETATTR     681 9.617621218    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 682) GETATTR FH: 0xa7e5bbc5     682 9.639157371    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 681) GETATTR     683 9.639279098    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 684) GETATTR FH: 0xa8050515     684 9.660669335    192.168.201.1         192.168.202.2 NFS      312    V4 Reply (Call In 683) GETATTR     685 9.660787725    192.168.202.2         192.168.201.1 NFS      304    V4 Call (Reply In 686) READDIR FH: 0x7e9e8300     686 9.682612756    192.168.201.1         192.168.202.2 NFS      1472   V4 Reply (Call In 685) READDIR     687 9.682646761    192.168.202.2         192.168.201.1 TCP      68     788 → 2049 [ACK] Seq=67450 Ack=135965 Win=1444 Len=0 TSval=913106513 TSecr=2645776747     688 9.682906293    192.168.202.2         192.168.201.1 NFS      280    V4 Call (Reply In 689) GETATTR FH: 0xa8050515

Lots of GETATTR calls the second time around (each file ?).

Really NFS is really broken performance wise these days and it "appears" that significant/huge improvements are possible.

Anyone know what group/who is responsible for NFS protocol these days ?

Also what group/who is responsible for the Linux kernel's implementation of it ?
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux