On Mon, Aug 03, 2020 at 12:24:21PM -0400, Chuck Lever wrote: > Hi Timo- > > > On Aug 3, 2020, at 11:05 AM, Timo Rothenpieler <timo@xxxxxxxxxxxxxxxx> wrote: > > > > Hello, > > > > I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB cards and wanted to setup NFS over RDMA on it. > > > > However, while mounting the FS over RDMA works fine, actually using it results in the following messages absolutely hammering dmesg on both client and server: > > > >> https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log > > > > The spam only stops once I forcibly reboot the client. The filesystem gets nowhere during all this. The retrans counter in nfsstat just keeps going up, nothing actually gets done. > > > > This is on Linux 5.4.54, using nfs-utils 2.4.3. > > The mlx5 driver had enhanced-mode disabled in order to enable IPoIB connected mode with an MTU of 65520. > > > > Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only when I mount via rdma that things go wrong. > > > > Is this an issue on my end, or did I run into a bug somewhere here? > > Any pointers, patches and solutions to test are welcome. > > I haven't seen that failure mode here, so best I can recommend is > keep investigating. I've copied linux-rdma in case they have any > advice. The mentioning of IPoIB is a slightly confusing in the context of NFS-over-RDMA. Are you running NFS over IPoIB? >From brief look on CQE error syndrome (local length error), the client sends wrong WQE. Thanks > > -- > Chuck Lever > > >