pNFS: invalid IP:port selection when talks to DS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Dear (p)NFS-ors,

we observe VERY unpleasant situation with pNFS in the production.
Our hosts run multiple DSes on different ports, usually 24001-24009.
With CentOS7 (3.10.0-514.6.2.el7.x86_64) we see that client takes
a wrong port number when talks to data server:

If client uses different DSes on the same host, then at some point it starts
to send data to the wrong port number:

Client <=> MDS:


    1 0.000000000 131.169.251.53 → 131.169.51.35 NFS V4 Call OPEN DH: 0x7cbc716b/MIL-68-onebatch-80C-30s-00057.tif.metadata
    2 0.001469799 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 1) OPEN StateID: 0xec18
    3 0.001578128 131.169.251.53 → 131.169.51.35 NFS V4 Call SETATTR FH: 0x6ccf3dfa
    4 0.002657187 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 3) SETATTR
    5 0.003243819 131.169.251.53 → 131.169.51.35 NFS V4 Call LAYOUTGET
    6 0.014603386 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 5) LAYOUTGET
    7 0.014899121 131.169.251.53 → 131.169.51.35 NFS V4 Call GETDEVINFO
    8 0.015014216 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 7) GETDEVINFO
        Opcode: GETDEVINFO (47)
            Status: NFS4_OK (0)
            layout type: LAYOUT4_NFSV4_1_FILES (1)
            device index: 0
            r_netid: tcp
                length: 3
                contents: tcp
                fill bytes: opaque data
            r_addr: 131.169.51.50.93.197
                length: 20
                contents: 131.169.51.50.93.197
            r_netid: tcp
                length: 3
                contents: tcp
                fill bytes: opaque data
            r_addr: 131.169.51.50.93.197
                length: 20
                contents: 131.169.51.50.93.197
            notification bitmap: 6
            notification bitmap: 0
    [Main Opcode: GETDEVINFO (47)]

    9 0.105442455 131.169.251.53 → 131.169.51.35 NFS V4 Call TEST_STATEID
   10 0.105521354 131.169.51.35 → 131.169.251.53 NFS V4 Reply (Call In 9) TEST_STATEID



NOTICE, that 131.169.51.50.93.197 corresponds to port 24005.

client <=> DS

$ tshark -r ds-write.pcap  -n -z conv,tcp
    1   0.000000 131.169.251.53 → 131.169.51.50 NFS V4 Call WRITE StateID: 0xff01 Offset: 0 Len: 3968
    2   0.000090 131.169.51.50 → 131.169.251.53 NFS V4 Reply (Call In 1) WRITE Status: NFS4ERR_BAD_STATEID
================================================================================
TCP Conversations
Filter:<No Filter>
                                                           |       <-      | |       ->      | |     Total     |    Relative    |   Duration   |
                                                           | Frames  Bytes | | Frames  Bytes | | Frames  Bytes |      Start     |              |
131.169.51.50:24006        <-> 131.169.251.53:847               1      4240       1       168       2      4408     0.000000000         0.0001
================================================================================

NOTICE, that it talks to DS on port 24006!

Is there know fix which is missing in CentOS7? I can't reproduce it with
4.9 kernel (or it's harder to reproduce).


The packages are attached.

Tigran.

Attachment: ds-write.pcapng
Description: application/pcapng

Attachment: mds.pcapng
Description: application/pcapng


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux