Hi Sagi, I think we need to add fence to the UMR wqe. so lets try this one: diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index ad8a263..c38c4fa 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -3737,8 +3737,7 @@ static void dump_wqe(struct mlx5_ib_qp *qp, int idx, int size_16) static u8 get_fence(u8 fence, struct ib_send_wr *wr) { - if (unlikely(wr->opcode == IB_WR_LOCAL_INV && - wr->send_flags & IB_SEND_FENCE)) + if (wr->opcode == IB_WR_LOCAL_INV || wr->opcode == IB_WR_REG_MR) return MLX5_FENCE_MODE_STRONG_ORDERING; if (unlikely(fence)) {
This will kill performance, isn't there another fix that can be applied just for retransmission flow?
Couldn't repro that case but I run some initial tests in my Lab (with my patch above) - not performace servers: Initiator with 24 CPUs (2 threads/core, 6 cores/socket, 2 sockets), Connect IB (same driver mlx5_ib), kernel 4.10.0, fio test with 24 jobs and 128 iodepth. register_always=N Target - 1 subsystem with 1 ns (null_blk) bs read (without/with patch) write (without/with patch) --- -------------------------- --------------------------- 512 1019k / 1008k 1004k / 992k 1k 1021k / 1013k 1002k / 991k 4k 1030k / 1022k 978k / 969k CPU usage is 100% for both cases in the initiator side. haven't seen difference with bs = 16k. No so big drop like we would expect,
Obviously you won't see a drop without registering memory for small IO (register_always=N), this would bypass registration altogether... Please retest with register_always=Y. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html