Re: copy_file_range() infinitely hangs on NFSv4.2 over RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18.02.2021 19:30, Olga Kornievskaia wrote:
Thank you for getting tracepoints from a busy server but can you get
more? As suspected, the server is having issues sending the callback.
I'm not sure why. Any chance to turn on the server's sunrpc
tracespoints, probably both sunrpc and rdmas tracepoints, I wonder if
we can any more info about why it's failing?

I isolated out two of the machines on that cluster now, one acting as NFS server from an ext4 mount, the other is the same client as before. That way I managed to capture a trace and ibdump of an entire cycle: mount + successful copy + 5 minutes later a copy that got stuck

Next to no noise happened during those traces, you can find them attached.

Another observation made due to this: unmount and re-mounting the NFS share also gets it back into working condition for a while, no reboot necessary. During this trace, I got "lucky", and after just 5 minutes of waiting, it got stuck.

Before that, I had a run of mount + trying to copy every 5 minutes where it ran for 45 minutes without getting stuck. At which point I decided to remount once more.

Attachment: sniffer.pcap.xz
Description: Binary data

Attachment: trace.dat.xz
Description: Binary data

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux