re-sending without attachments. The capture failes can be found at: client <-> mds: https://desycloud.desy.de/index.php/s/58JFyfMQmNF99pU client <-> ds: https://desycloud.desy.de/index.php/s/dKf290ikQcifL9K Tigran. ----- Original Message ----- > From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@xxxxxxx> > To: "Linux NFS Mailing list" <linux-nfs@xxxxxxxxxxxxxxx> > Cc: "Steve Dickson" <steved@xxxxxxxxxx> > Sent: Monday, March 20, 2017 4:52:40 PM > Subject: pNFS: invalid IP:port selection when talks to DS > Dear (p)NFS-ors, > > we observe VERY unpleasant situation with pNFS in the production. > Our hosts run multiple DSes on different ports, usually 24001-24009. > With CentOS7 (3.10.0-514.6.2.el7.x86_64) we see that client takes > a wrong port number when talks to data server: > > If client uses different DSes on the same host, then at some point it starts > to send data to the wrong port number: > > Client <=> MDS: > > > 1 0.000000000 131.169.251.53 → 131.169.51.35 NFS V4 Call OPEN DH: > 0x7cbc716b/MIL-68-onebatch-80C-30s-00057.tif.metadata > 2 0.001469799 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 1) OPEN > StateID: 0xec18 > 3 0.001578128 131.169.251.53 → 131.169.51.35 NFS V4 Call SETATTR FH: 0x6ccf3dfa > 4 0.002657187 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 3) SETATTR > 5 0.003243819 131.169.251.53 → 131.169.51.35 NFS V4 Call LAYOUTGET > 6 0.014603386 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 5) LAYOUTGET > 7 0.014899121 131.169.251.53 → 131.169.51.35 NFS V4 Call GETDEVINFO > 8 0.015014216 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 7) GETDEVINFO > Opcode: GETDEVINFO (47) > Status: NFS4_OK (0) > layout type: LAYOUT4_NFSV4_1_FILES (1) > device index: 0 > r_netid: tcp > length: 3 > contents: tcp > fill bytes: opaque data > r_addr: 131.169.51.50.93.197 > length: 20 > contents: 131.169.51.50.93.197 > r_netid: tcp > length: 3 > contents: tcp > fill bytes: opaque data > r_addr: 131.169.51.50.93.197 > length: 20 > contents: 131.169.51.50.93.197 > notification bitmap: 6 > notification bitmap: 0 > [Main Opcode: GETDEVINFO (47)] > > 9 0.105442455 131.169.251.53 → 131.169.51.35 NFS V4 Call TEST_STATEID > 10 0.105521354 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 9) > TEST_STATEID > > > > NOTICE, that 131.169.51.50.93.197 corresponds to port 24005. > > client <=> DS > > $ tshark -r ds-write.pcap -n -z conv,tcp > 1 0.000000 131.169.251.53 → 131.169.51.50 NFS V4 Call WRITE StateID: 0xff01 > Offset: 0 Len: 3968 > 2 0.000090 131.169.51.50 → 131.169.251.53 NFS V4 Reply (Call In 1) WRITE > Status: NFS4ERR_BAD_STATEID > ================================================================================ > TCP Conversations > Filter:<No Filter> > | <- | | -> | | Total | Relative | Duration | > | Frames Bytes | | Frames Bytes | | Frames Bytes | Start | > | | > 131.169.51.50:24006 <-> 131.169.251.53:847 1 4240 > 1 168 2 4408 0.000000000 0.0001 > ================================================================================ > > NOTICE, that it talks to DS on port 24006! > > Is there know fix which is missing in CentOS7? I can't reproduce it with > 4.9 kernel (or it's harder to reproduce). > > > The packages are attached. > > Tigran. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html