On 01/02/18 08:29, Terry Barnaby wrote:
On 01/02/18 01:34, Jeremy Linton wrote:
On 01/31/2018 09:49 AM, J. Bruce Fields wrote:
On Tue, Jan 30, 2018 at 01:52:49PM -0600, Jeremy Linton wrote:
Have you tried this with a '-o nfsvers=3' during mount? Did that help?
I noticed a large decrease in my kernel build times across NFS/lan
a while
back after a machine/kernel/10g upgrade. After playing with
mount/export
options filesystem tuning/etc, I got to this point of timing a
bunch of
these operations vs the older machine, at which point I discovered
that
simply backing down to NFSv3 solved the problem.
AKA a nfsv3 server on a 10 year old 4 disk xfs RAID5 on 1Gb
ethernet, was
slower than a modern machine with a 8 disk xfs RAID5 on 10Gb on
nfsv4. The
effect was enough to change a kernel build from ~45 minutes down to
less
than 5.
Using NFSv3 in async mode is faster than NFSv4 in async mode (still
abysmal in sync mode).
NFSv3 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp;
sync)
real 2m25.717s
user 0m8.739s
sys 0m13.362s
NFSv4 async: sync; time (tar -xf linux-4.14.15.tar.gz -C /data2/tmp;
sync)
real 3m33.032s
user 0m8.506s
sys 0m16.930s
NFSv3 async: wireshark trace
No. Time Source Destination Protocol Length Info
18527 2.815884979 192.168.202.2 192.168.202.1 NFS
216 V3 CREATE Call (Reply In 18528), DH: 0x62f39428/dma.h Mode:
EXCLUSIVE
18528 2.816362338 192.168.202.1 192.168.202.2 NFS
328 V3 CREATE Reply (Call In 18527)
18529 2.816418841 192.168.202.2 192.168.202.1 NFS
224 V3 SETATTR Call (Reply In 18530), FH: 0x13678ba0
18530 2.816871820 192.168.202.1 192.168.202.2 NFS
216 V3 SETATTR Reply (Call In 18529)
18531 2.816966771 192.168.202.2 192.168.202.1 NFS
1148 V3 WRITE Call (Reply In 18532), FH: 0x13678ba0 Offset: 0 Len:
934 FILE_SYNC
18532 2.817441291 192.168.202.1 192.168.202.2 NFS
208 V3 WRITE Reply (Call In 18531) Len: 934 FILE_SYNC
18533 2.817495775 192.168.202.2 192.168.202.1 NFS
236 V3 SETATTR Call (Reply In 18534), FH: 0x13678ba0
18534 2.817920346 192.168.202.1 192.168.202.2 NFS
216 V3 SETATTR Reply (Call In 18533)
18535 2.818002910 192.168.202.2 192.168.202.1 NFS
216 V3 CREATE Call (Reply In 18536), DH: 0x62f39428/elf.h Mode:
EXCLUSIVE
18536 2.818492126 192.168.202.1 192.168.202.2 NFS
328 V3 CREATE Reply (Call In 18535)
This is taking about 2ms for a small file write rather than 3ms for
NFSv4. There is an extra GETATTR and CLOSE RPC in NFSv4 accounting for
the difference.
So where I am:
1. NFS in sync mode, at least on my two Fedora27 systems for my usage
is completely unusable. (sync: 2 hours, async: 3 minutes, localdisk:
13 seconds).
2. NFS async mode is working, but the small writes are still very slow.
3. NFS in async mode is 30% better with NFSv3 than NFSv4 when writing
small files due to the increased latency caused by NFSv4's two extra
RPC calls.
I really think that in 2018 we should be able to have better NFS
performance when writing many small files such as used in software
development. This would speed up any system that was using NFS with
this sort of workload dramatically and reduce power usage all for some
improvements in the NFS protocol.
I don't know the details of if this would work, or who is responsible
for NFS, but it would be good if possible to have some improvements
(NFSv4.3 ?). Maybe:
1. Have an OPEN-SETATTR-WRITE RPC call all in one and a SETATTR-CLOSE
call all in one. This would reduce the latency of a small file to 1ms
rather than 3ms thus 66% faster. Would require the client to delay the
OPEN/SETATTR until the first WRITE. Not sure how possible this is in
the implementations. Maybe READ's could be improved as well but
getting the OPEN through quick may be better in this case ?
2. Could go further with an OPEN-SETATTR-WRITE-CLOSE RPC call. (0.5ms
vs 3ms).
3. On sync/async modes personally I think it would be better for the
client to request the mount in sync/async mode. The setting of sync on
the server side would just enforce sync mode for all clients. If the
server is in the default async mode clients can mount using sync or
async as to their requirements. This seems to match normal VFS
semantics and usage patterns better.
4. The 0.5ms RPC latency seems a bit high (ICMP pings 0.12ms) . Maybe
this is worth investigating in the Linux kernel processing (how ?) ?
5. The 20ms RPC latency I see in sync mode needs a look at on my
system although async mode is fine for my usage. Maybe this ends up as
2 x 10ms drive seeks on ext4 and is thus expected.
Yet another poor NFSv3 performance issue. If I do a "ls -lR" of a
certain NFS mounted directory over a slow link (NFS over Openvpn over
FTTP 80/20Mbps), just after mounting the file system (default NFSv4
mount with async), it takes about 9 seconds. If I run the same "ls -lR"
again, just after, it takes about 60 seconds. So much for caching ! I
have noticed Makefile based builds (over Ethernet 1Gbps) taking a long
time with a second or so between each directory, I think this maybe why.
Listing the directory using a NFSv3 mount takes 67 seconds on the first
mount and about the same on subsequent ones. No noticeable caching
(default mount options with async), At least NFSv4 is fast the first time !
NFSv4 directory reads after mount:
No. Time Source Destination Protocol Length Info
667 4.560833210 192.168.202.2 192.168.201.1 NFS
304 V4 Call (Reply In 672) READDIR FH: 0xde55a546
668 4.582809439 192.168.201.1 192.168.202.2 TCP
1405 2049 â 679 [ACK] Seq=304477 Ack=45901 Win=1452 Len=1337
TSval=2646321616 TSecr=913651354 [TCP segment of a reassembled PDU]
669 4.582986377 192.168.201.1 192.168.202.2 TCP
1405 2049 â 679 [ACK] Seq=305814 Ack=45901 Win=1452 Len=1337
TSval=2646321616 TSecr=913651354 [TCP segment of a reassembled PDU]
670 4.583003805 192.168.202.2 192.168.201.1 TCP
68 679 â 2049 [ACK] Seq=45901 Ack=307151 Win=1444 Len=0
TSval=913651376 TSecr=2646321616
671 4.583265423 192.168.201.1 192.168.202.2 TCP
1405 2049 â 679 [ACK] Seq=307151 Ack=45901 Win=1452 Len=1337
TSval=2646321616 TSecr=913651354 [TCP segment of a reassembled PDU]
672 4.583280603 192.168.201.1 192.168.202.2 NFS
289 V4 Reply (Call In 667) READDIR
673 4.583291818 192.168.202.2 192.168.201.1 TCP
68 679 â 2049 [ACK] Seq=45901 Ack=308709 Win=1444 Len=0
TSval=913651377 TSecr=2646321616
674 4.583819172 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 675) GETATTR FH: 0xb91bfde7
675 4.605389953 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 674) GETATTR
676 4.605491075 192.168.202.2 192.168.201.1 NFS
288 V4 Call (Reply In 677) ACCESS FH: 0xb91bfde7, [Check: RD LU MD XT DL]
677 4.626848306 192.168.201.1 192.168.202.2 NFS
240 V4 Reply (Call In 676) ACCESS, [Allowed: RD LU MD XT DL]
678 4.626993773 192.168.202.2 192.168.201.1 NFS
304 V4 Call (Reply In 679) READDIR FH: 0xb91bfde7
679 4.649330354 192.168.201.1 192.168.202.2 NFS
2408 V4 Reply (Call In 678) READDIR
680 4.649380840 192.168.202.2 192.168.201.1 TCP
68 679 â 2049 [ACK] Seq=46569 Ack=311465 Win=1444 Len=0
TSval=913651443 TSecr=2646321683
681 4.649716746 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 682) GETATTR FH: 0xb6d01f2a
682 4.671167708 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 681) GETATTR
683 4.671281003 192.168.202.2 192.168.201.1 NFS
288 V4 Call (Reply In 684) ACCESS FH: 0xb6d01f2a, [Check: RD LU MD XT DL]
684 4.692647455 192.168.201.1 192.168.202.2 NFS
240 V4 Reply (Call In 683) ACCESS, [Allowed: RD LU MD XT DL]
685 4.692825251 192.168.202.2 192.168.201.1 NFS
304 V4 Call (Reply In 690) READDIR FH: 0xb6d01f2a
686 4.715060586 192.168.201.1 192.168.202.2 TCP
1405 2049 â 679 [ACK] Seq=311881 Ack=47237 Win=1452 Len=1337
TSval=2646321748 TSecr=913651486 [TCP segment of a reassembled PDU]
687 4.715199557 192.168.201.1 192.168.202.2 TCP
1405 2049 â 679 [ACK] Seq=313218 Ack=47237 Win=1452 Len=1337
TSval=2646321748 TSecr=913651486 [TCP segment of a reassembled PDU]
688 4.715215055 192.168.202.2 192.168.201.1 TCP
68 679 â 2049 [ACK] Seq=47237 Ack=314555 Win=1444 Len=0
TSval=913651509 TSecr=2646321748
689 4.715524465 192.168.201.1 192.168.202.2 TCP
1405 2049 â 679 [ACK] Seq=314555 Ack=47237 Win=1452 Len=1337
TSval=2646321749 TSecr=913651486 [TCP segment of a reassembled PDU]
690 4.715911571 192.168.201.1 192.168.202.2 NFS
1449 V4 Reply (Call In 685) READDIR
NFS directory reads later:
No. Time Source Destination Protocol Length Info
664 9.485593049 192.168.202.2 192.168.201.1 NFS
304 V4 Call (Reply In 669) READDIR FH: 0x1933e99e
665 9.507596250 192.168.201.1 192.168.202.2 TCP
1405 2049 â 788 [ACK] Seq=127921 Ack=65730 Win=3076 Len=1337
TSval=2645776572 TSecr=913106316 [TCP segment of a reassembled PDU]
666 9.507717425 192.168.201.1 192.168.202.2 TCP
1405 2049 â 788 [ACK] Seq=129258 Ack=65730 Win=3076 Len=1337
TSval=2645776572 TSecr=913106316 [TCP segment of a reassembled PDU]
667 9.507733352 192.168.202.2 192.168.201.1 TCP
68 788 â 2049 [ACK] Seq=65730 Ack=130595 Win=1444 Len=0
TSval=913106338 TSecr=2645776572
668 9.507987020 192.168.201.1 192.168.202.2 TCP
1405 2049 â 788 [ACK] Seq=130595 Ack=65730 Win=3076 Len=1337
TSval=2645776572 TSecr=913106316 [TCP segment of a reassembled PDU]
669 9.508456847 192.168.201.1 192.168.202.2 NFS
989 V4 Reply (Call In 664) READDIR
670 9.508472149 192.168.202.2 192.168.201.1 TCP
68 788 â 2049 [ACK] Seq=65730 Ack=132853 Win=1444 Len=0
TSval=913106338 TSecr=2645776572
671 9.508880627 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 672) GETATTR FH: 0x7e9e8300
672 9.530375865 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 671) GETATTR
673 9.530564317 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 674) GETATTR FH: 0xcb837ac9
674 9.551906321 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 673) GETATTR
675 9.552064038 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 676) GETATTR FH: 0xbf951d32
676 9.574210528 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 675) GETATTR
677 9.574334117 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 678) GETATTR FH: 0xd3f3dc3e
678 9.595902902 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 677) GETATTR
679 9.596025484 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 680) GETATTR FH: 0xf534332a
680 9.617497794 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 679) GETATTR
681 9.617621218 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 682) GETATTR FH: 0xa7e5bbc5
682 9.639157371 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 681) GETATTR
683 9.639279098 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 684) GETATTR FH: 0xa8050515
684 9.660669335 192.168.201.1 192.168.202.2 NFS
312 V4 Reply (Call In 683) GETATTR
685 9.660787725 192.168.202.2 192.168.201.1 NFS
304 V4 Call (Reply In 686) READDIR FH: 0x7e9e8300
686 9.682612756 192.168.201.1 192.168.202.2 NFS
1472 V4 Reply (Call In 685) READDIR
687 9.682646761 192.168.202.2 192.168.201.1 TCP
68 788 â 2049 [ACK] Seq=67450 Ack=135965 Win=1444 Len=0
TSval=913106513 TSecr=2645776747
688 9.682906293 192.168.202.2 192.168.201.1 NFS
280 V4 Call (Reply In 689) GETATTR FH: 0xa8050515
Lots of GETATTR calls the second time around (each file ?).
Really NFS is really broken performance wise these days and it "appears"
that significant/huge improvements are possible.
Anyone know what group/who is responsible for NFS protocol these days ?
Also what group/who is responsible for the Linux kernel's implementation
of it ?
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx