Re: Fwd: Re: Fedora27: NFS v4 terrible write performance, is async working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 05, 2018 at 08:21:06AM +0000, Terry Barnaby wrote:
> On 01/02/18 08:29, Terry Barnaby wrote:
> > On 01/02/18 01:34, Jeremy Linton wrote:
> > > On 01/31/2018 09:49 AM, J. Bruce Fields wrote:
> > > > On Tue, Jan 30, 2018 at 01:52:49PM -0600, Jeremy Linton wrote:
> > > > > Have you tried this with a '-o nfsvers=3' during mount? Did that help?
> > > > > 
> > > > > I noticed a large decrease in my kernel build times across
> > > > > NFS/lan a while
> > > > > back after a machine/kernel/10g upgrade. After playing with
> > > > > mount/export
> > > > > options filesystem tuning/etc, I got to this point of timing
> > > > > a bunch of
> > > > > these operations vs the older machine, at which point I
> > > > > discovered that
> > > > > simply backing down to NFSv3 solved the problem.
> > > > > 
> > > > > AKA a nfsv3 server on a 10 year old 4 disk xfs RAID5 on 1Gb
> > > > > ethernet, was
> > > > > slower than a modern machine with a 8 disk xfs RAID5 on 10Gb
> > > > > on nfsv4. The
> > > > > effect was enough to change a kernel build from ~45 minutes
> > > > > down to less
> > > > > than 5.
> > > > 
> > Using NFSv3 in async mode is faster than NFSv4 in async mode (still
> > abysmal in sync mode).
> > 
> > NFSv3 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp;
> > sync)
> > 
> > real    2m25.717s
> > user    0m8.739s
> > sys     0m13.362s
> > 
> > NFSv4 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp;
> > sync)
> > 
> > real    3m33.032s
> > user    0m8.506s
> > sys     0m16.930s
> > 
> > NFSv3 async: wireshark trace
> > 
> > No.     Time           Source Destination           Protocol Length Info
> >   18527 2.815884979    192.168.202.2         192.168.202.1 NFS     
> > 216    V3 CREATE Call (Reply In 18528), DH: 0x62f39428/dma.h Mode:
> > EXCLUSIVE
> >   18528 2.816362338    192.168.202.1         192.168.202.2 NFS     
> > 328    V3 CREATE Reply (Call In 18527)
> >   18529 2.816418841    192.168.202.2         192.168.202.1 NFS     
> > 224    V3 SETATTR Call (Reply In 18530), FH: 0x13678ba0
> >   18530 2.816871820    192.168.202.1         192.168.202.2 NFS     
> > 216    V3 SETATTR Reply (Call In 18529)
> >   18531 2.816966771    192.168.202.2         192.168.202.1 NFS     
> > 1148   V3 WRITE Call (Reply In 18532), FH: 0x13678ba0 Offset: 0 Len: 934
> > FILE_SYNC
> >   18532 2.817441291    192.168.202.1         192.168.202.2 NFS     
> > 208    V3 WRITE Reply (Call In 18531) Len: 934 FILE_SYNC
> >   18533 2.817495775    192.168.202.2         192.168.202.1 NFS     
> > 236    V3 SETATTR Call (Reply In 18534), FH: 0x13678ba0
> >   18534 2.817920346    192.168.202.1         192.168.202.2 NFS     
> > 216    V3 SETATTR Reply (Call In 18533)
> >   18535 2.818002910    192.168.202.2         192.168.202.1 NFS     
> > 216    V3 CREATE Call (Reply In 18536), DH: 0x62f39428/elf.h Mode:
> > EXCLUSIVE
> >   18536 2.818492126    192.168.202.1         192.168.202.2 NFS     
> > 328    V3 CREATE Reply (Call In 18535)
> > 
> > This is taking about 2ms for a small file write rather than 3ms for
> > NFSv4. There is an extra GETATTR and CLOSE RPC in NFSv4 accounting for
> > the difference.
> > 
> > So where I am:
> > 
> > 1. NFS in sync mode, at least on my two Fedora27 systems for my usage is
> > completely unusable. (sync: 2 hours, async: 3 minutes, localdisk: 13
> > seconds).
> > 
> > 2. NFS async mode is working, but the small writes are still very slow.
> > 
> > 3. NFS in async mode is 30% better with NFSv3 than NFSv4 when writing
> > small files due to the increased latency caused by NFSv4's two extra RPC
> > calls.
> > 
> > I really think that in 2018 we should be able to have better NFS
> > performance when writing many small files such as used in software
> > development. This would speed up any system that was using NFS with this
> > sort of workload dramatically and reduce power usage all for some
> > improvements in the NFS protocol.
> > 
> > I don't know the details of if this would work, or who is responsible
> > for NFS, but it would be good if possible to have some improvements
> > (NFSv4.3 ?). Maybe:
> > 
> > 1. Have an OPEN-SETATTR-WRITE RPC call all in one and a SETATTR-CLOSE
> > call all in one. This would reduce the latency of a small file to 1ms
> > rather than 3ms thus 66% faster. Would require the client to delay the
> > OPEN/SETATTR until the first WRITE. Not sure how possible this is in the
> > implementations. Maybe READ's could be improved as well but getting the
> > OPEN through quick may be better in this case ?
> > 
> > 2. Could go further with an OPEN-SETATTR-WRITE-CLOSE RPC call. (0.5ms vs
> > 3ms).
> > 
> > 3. On sync/async modes personally I think it would be better for the
> > client to request the mount in sync/async mode. The setting of sync on
> > the server side would just enforce sync mode for all clients. If the
> > server is in the default async mode clients can mount using sync or
> > async as to their requirements. This seems to match normal VFS semantics
> > and usage patterns better.
> > 
> > 4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe
> > this is worth investigating in the Linux kernel processing (how ?) ?
> > 
> > 5. The 20ms RPC latency I see in sync mode needs a look at on my system
> > although async mode is fine for my usage. Maybe this ends up as 2 x 10ms
> > drive seeks on ext4 and is thus expected.
> > 
> Yet another poor NFSv3 performance issue. If I do a "ls -lR" of a certain
> NFS mounted directory over a slow link (NFS over Openvpn over FTTP
> 80/20Mbps), just after mounting the file system (default NFSv4 mount with
> async), it takes about 9 seconds. If I run the same "ls -lR" again, just
> after, it takes about 60 seconds.

A wireshark trace might help.

Also, is it possible some process is writing while this is happening?

--b.

> So much for caching ! I have noticed
> Makefile based builds (over Ethernet 1Gbps) taking a long time with a second
> or so between each directory, I think this maybe why.
> 
> Listing the directory using a NFSv3 mount takes 67 seconds on the first
> mount and about the same on subsequent ones. No noticeable caching (default
> mount options with async), At least NFSv4 is fast the first time !
> 
> NFSv4 directory reads after mount:
> 
> No.     Time           Source Destination           Protocol Length Info
>     667 4.560833210    192.168.202.2         192.168.201.1 NFS      304   
> V4 Call (Reply In 672) READDIR FH: 0xde55a546
>     668 4.582809439    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 679 [ACK] Seq=304477 Ack=45901 Win=1452 Len=1337 TSval=2646321616
> TSecr=913651354 [TCP segment of a reassembled PDU]
>     669 4.582986377    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 679 [ACK] Seq=305814 Ack=45901 Win=1452 Len=1337 TSval=2646321616
> TSecr=913651354 [TCP segment of a reassembled PDU]
>     670 4.583003805    192.168.202.2         192.168.201.1 TCP      68    
> 679 → 2049 [ACK] Seq=45901 Ack=307151 Win=1444 Len=0 TSval=913651376
> TSecr=2646321616
>     671 4.583265423    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 679 [ACK] Seq=307151 Ack=45901 Win=1452 Len=1337 TSval=2646321616
> TSecr=913651354 [TCP segment of a reassembled PDU]
>     672 4.583280603    192.168.201.1         192.168.202.2 NFS      289   
> V4 Reply (Call In 667) READDIR
>     673 4.583291818    192.168.202.2         192.168.201.1 TCP      68    
> 679 → 2049 [ACK] Seq=45901 Ack=308709 Win=1444 Len=0 TSval=913651377
> TSecr=2646321616
>     674 4.583819172    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 675) GETATTR FH: 0xb91bfde7
>     675 4.605389953    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 674) GETATTR
>     676 4.605491075    192.168.202.2         192.168.201.1 NFS      288   
> V4 Call (Reply In 677) ACCESS FH: 0xb91bfde7, [Check: RD LU MD XT DL]
>     677 4.626848306    192.168.201.1         192.168.202.2 NFS      240   
> V4 Reply (Call In 676) ACCESS, [Allowed: RD LU MD XT DL]
>     678 4.626993773    192.168.202.2         192.168.201.1 NFS      304   
> V4 Call (Reply In 679) READDIR FH: 0xb91bfde7
>     679 4.649330354    192.168.201.1         192.168.202.2 NFS      2408  
> V4 Reply (Call In 678) READDIR
>     680 4.649380840    192.168.202.2         192.168.201.1 TCP      68    
> 679 → 2049 [ACK] Seq=46569 Ack=311465 Win=1444 Len=0 TSval=913651443
> TSecr=2646321683
>     681 4.649716746    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 682) GETATTR FH: 0xb6d01f2a
>     682 4.671167708    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 681) GETATTR
>     683 4.671281003    192.168.202.2         192.168.201.1 NFS      288   
> V4 Call (Reply In 684) ACCESS FH: 0xb6d01f2a, [Check: RD LU MD XT DL]
>     684 4.692647455    192.168.201.1         192.168.202.2 NFS      240   
> V4 Reply (Call In 683) ACCESS, [Allowed: RD LU MD XT DL]
>     685 4.692825251    192.168.202.2         192.168.201.1 NFS      304   
> V4 Call (Reply In 690) READDIR FH: 0xb6d01f2a
>     686 4.715060586    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 679 [ACK] Seq=311881 Ack=47237 Win=1452 Len=1337 TSval=2646321748
> TSecr=913651486 [TCP segment of a reassembled PDU]
>     687 4.715199557    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 679 [ACK] Seq=313218 Ack=47237 Win=1452 Len=1337 TSval=2646321748
> TSecr=913651486 [TCP segment of a reassembled PDU]
>     688 4.715215055    192.168.202.2         192.168.201.1 TCP      68    
> 679 → 2049 [ACK] Seq=47237 Ack=314555 Win=1444 Len=0 TSval=913651509
> TSecr=2646321748
>     689 4.715524465    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 679 [ACK] Seq=314555 Ack=47237 Win=1452 Len=1337 TSval=2646321749
> TSecr=913651486 [TCP segment of a reassembled PDU]
>     690 4.715911571    192.168.201.1         192.168.202.2 NFS      1449  
> V4 Reply (Call In 685) READDIR
> 
> NFS directory reads later:
> 
> No.     Time           Source Destination           Protocol Length Info
>     664 9.485593049    192.168.202.2         192.168.201.1 NFS      304   
> V4 Call (Reply In 669) READDIR FH: 0x1933e99e
>     665 9.507596250    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 788 [ACK] Seq=127921 Ack=65730 Win=3076 Len=1337 TSval=2645776572
> TSecr=913106316 [TCP segment of a reassembled PDU]
>     666 9.507717425    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 788 [ACK] Seq=129258 Ack=65730 Win=3076 Len=1337 TSval=2645776572
> TSecr=913106316 [TCP segment of a reassembled PDU]
>     667 9.507733352    192.168.202.2         192.168.201.1 TCP      68    
> 788 → 2049 [ACK] Seq=65730 Ack=130595 Win=1444 Len=0 TSval=913106338
> TSecr=2645776572
>     668 9.507987020    192.168.201.1         192.168.202.2 TCP      1405  
> 2049 → 788 [ACK] Seq=130595 Ack=65730 Win=3076 Len=1337 TSval=2645776572
> TSecr=913106316 [TCP segment of a reassembled PDU]
>     669 9.508456847    192.168.201.1         192.168.202.2 NFS      989   
> V4 Reply (Call In 664) READDIR
>     670 9.508472149    192.168.202.2         192.168.201.1 TCP      68    
> 788 → 2049 [ACK] Seq=65730 Ack=132853 Win=1444 Len=0 TSval=913106338
> TSecr=2645776572
>     671 9.508880627    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 672) GETATTR FH: 0x7e9e8300
>     672 9.530375865    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 671) GETATTR
>     673 9.530564317    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 674) GETATTR FH: 0xcb837ac9
>     674 9.551906321    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 673) GETATTR
>     675 9.552064038    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 676) GETATTR FH: 0xbf951d32
>     676 9.574210528    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 675) GETATTR
>     677 9.574334117    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 678) GETATTR FH: 0xd3f3dc3e
>     678 9.595902902    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 677) GETATTR
>     679 9.596025484    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 680) GETATTR FH: 0xf534332a
>     680 9.617497794    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 679) GETATTR
>     681 9.617621218    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 682) GETATTR FH: 0xa7e5bbc5
>     682 9.639157371    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 681) GETATTR
>     683 9.639279098    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 684) GETATTR FH: 0xa8050515
>     684 9.660669335    192.168.201.1         192.168.202.2 NFS      312   
> V4 Reply (Call In 683) GETATTR
>     685 9.660787725    192.168.202.2         192.168.201.1 NFS      304   
> V4 Call (Reply In 686) READDIR FH: 0x7e9e8300
>     686 9.682612756    192.168.201.1         192.168.202.2 NFS      1472  
> V4 Reply (Call In 685) READDIR
>     687 9.682646761    192.168.202.2         192.168.201.1 TCP      68    
> 788 → 2049 [ACK] Seq=67450 Ack=135965 Win=1444 Len=0 TSval=913106513
> TSecr=2645776747
>     688 9.682906293    192.168.202.2         192.168.201.1 NFS      280   
> V4 Call (Reply In 689) GETATTR FH: 0xa8050515
> 
> Lots of GETATTR calls the second time around (each file ?).
> 
> Really NFS is really broken performance wise these days and it "appears"
> that significant/huge improvements are possible.
> 
> Anyone know what group/who is responsible for NFS protocol these days ?
> 
> Also what group/who is responsible for the Linux kernel's implementation of
> it ?
> 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux