While running xfstests on an NFS/RDMA mount, I see this in the client's /var/log/messages multiple times: Jun 22 14:13:45 manet kernel: mlx5_0:dump_cqe:275:(pid 0): dump error cqe Jun 22 14:13:45 manet kernel: 00000000 00000000 00000000 00000000 Jun 22 14:13:45 manet kernel: 00000000 00000000 00000000 00000000 Jun 22 14:13:45 manet kernel: 00000000 00000000 00000000 00000000 Jun 22 14:13:45 manet kernel: 00000000 08007806 250000cd 024027d3 Jun 22 14:13:45 manet kernel: rpcrdma: fastreg: memory management operation error (6/0x78) As far as I can tell the client is able to recover and continue the test. However, this error is not supposed to happen in normal operation. This is with a Mellanox CX4 in RoCEv1 mode, v4.12-rc2.
Is this a regression? What kernel version are you running? FW revision? Is the below commit applied? commit 6e8484c5cf07c7ee632587e98c1a12d319dacb7c Author: Max Gurtovoy <maxg@xxxxxxxxxxxx> Date: Sun May 28 10:53:11 2017 +0300 RDMA/mlx5: set UMR wqe fence according to HCA cap Cache the needed umr_fence and set the wqe ctrl segmennt accordingly. Signed-off-by: Max Gurtovoy <maxg@xxxxxxxxxxxx> Acked-by: Leon Romanovsky <leon@xxxxxxxxxx> Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx> Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> This is the only thing that changed in that area lately... Can you try without it? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html