On 17.02.2021 23:37, Olga Kornievskaia wrote:
On Tue, Feb 16, 2021 at 5:27 PM Timo Rothenpieler <timo@xxxxxxxxxxxxxxxx> wrote:On 16.02.2021 21:37, Timo Rothenpieler wrote:I can't get a network (I assume just TCP/20049 is fine, and not also some RDMA trace?) right now, but I will once a user has finished their work on the machine.There wasn't any TCP traffic to dump on the NFSoRDMA Port, probably because everything is handled via RDMA/IB.Yeah, I'm not sure if tcpdump can snoop on the IB traffic. I know that upstream tcpdump can snoop on RDMA mellanox card (but I only know about the Roce mode).
I managed to get https://github.com/Mellanox/ibdump working. Attached is what it records when I run the xfs_io copy_range command that gets stuck(sniffer.pcap). Additionally, I rebooted the client machine, and captured the traffic when it does a then successful copy during the first few minutes of uptime(sniffer2.pcap).
Both those commands were run on a the same 500M file.
But I recorded a trace log of rpcrdma and sunrpc observing the situation. To me it looks like the COPY task (task:15886@7) completes successfully? The compressed trace.dat is attached.I'm having a hard time reproducing the problem. But I only tried "xfs", "btrfs", "ext4" (first two send a CLONE since the file system supports it), the last one exercises a copy. In all my tries your
I can also reproduce this on a test NFS share from an ext4 filesystem. Have not tested xfs yet.
xfs_io commands succeed. The differences between our environments are (1) ZFS vs (xfs, etc) and (2) IB vs RoCE. Question is: does any copy_file_range work over RDMA/IB. One thing to try a synchronous
It works, on any size of file, when the client machine is freshly booted (within its first 10~30 minutes of uptime).
copy: create a small file 10bytes and do a copy. Is this the case where we have copy and the callback racing, so instead do a really large copy: create a >=1GB file and do a copy. that will be an async copy but will not have a racy condition. Can you try those 2 examples for me?
I have observed in the past, that the xfs_io copy is more likely to succeed the smaller the file is, though I did not make out a definite pattern.
I did some bisecting on the number of bytes, and came up with the following:A 2097153 byte sized file gets stuck, while a 2097152(=2^21) sized one still works.
It's been stable at that cutoff point for a while now, so I think that's actually the point where it starts happening, and different behaviour I saw in the past was an issue in my testing.
Not sure how useful tracepoints here are. The results of the COPY isn't interesting as this is an async copy. The server should have sent a CB_COMPOUND with the copy's results. The process stack tells me that COPY is waiting for the results (waiting for the callback). So the question is there a problem of sending a callback over RDMA/IB? Or did the client receive it and missed it somehow? We really do need some better tracepoints in the copy (but we don't have them currently). Would you be willing to install the upstream libpcap/tcpdump to see if it can capture RDMA/IB traffic or perhaps Chunk knows that it doesn't work for sure?
Managed to get ibdump working, as stated above.
Attachment:
sniffer.pcap
Description: Binary data
Attachment:
sniffer2.pcap
Description: Binary data
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature