Hi Ursula, For the suggestion it still need to continue to check for len >= INLINE_ALIGN - off because 44 = 64-20. Which is still a valid case (len == inline - off). But I agree that it shouldn't do 2nd memcpy with zero length. Therefore there should be additional check for len != 0. Coming to IB_SEND_INLINE_DATA part, when ib_create_qp is called and if HCA doesn't support cap.max_inline_data, provider HCA driver is supposed to fail the call. And ULP is expected to do fallback to non_inline scheme. As it appears mlx4 driver is not failing this call, which is a bug that needs fix. Instead of failing the call, I prefer to provide the data path sooner based on my inline patch in this email thread. Parav > -----Original Message----- > From: Ursula Braun [mailto:ubraun@xxxxxxxxxxxxxxxxxx] > Sent: Thursday, March 16, 2017 6:51 AM > To: Parav Pandit <parav@xxxxxxxxxxxx>; Eli Cohen <eli@xxxxxxxxxxxx>; > Matan Barak <matanb@xxxxxxxxxxxx> > Cc: Saeed Mahameed <saeedm@xxxxxxxxxxxx>; Leon Romanovsky > <leonro@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx > Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > > Hi Parav, > > I run your new mlx4-Code together with changed SMC-R code no longer > mapping the IB_SEND_INLINE area. It worked - great! > > Below I have added a small improvement idea in your patch. > > Nevertheless I am still not sure, if I should keep the IB_SEND_INLINE flag in > the SMC-R code, since there is no guarantee that this will work with all kinds > of RoCE-devices. The maximum length for IB_SEND_INLINE depends on the > RoCE-driver - right? Is there an interface to determine such a maximum > length? Would ib_create_qp() return with an error, if the SMC-R specified > .cap.max_inline_data = 44 is not supported by a RoCE-driver? > > On 03/14/2017 04:24 PM, Parav Pandit wrote: > > Hi Ursula, > > > > > >> -----Original Message----- > >> From: Ursula Braun [mailto:ubraun@xxxxxxxxxxxxxxxxxx] > >> Sent: Tuesday, March 14, 2017 10:02 AM > >> To: Parav Pandit <parav@xxxxxxxxxxxx>; Eli Cohen <eli@xxxxxxxxxxxx>; > >> Matan Barak <matanb@xxxxxxxxxxxx> > >> Cc: Saeed Mahameed <saeedm@xxxxxxxxxxxx>; Leon Romanovsky > >> <leonro@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx > >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > >> > >> Hi Parav, > >> > >> I tried your mlx4-patch together with SMC on s390x, but it failed. > >> The SMC-R code tries to send 44 bytes as inline in 1 sge. > >> I wonder about a length check with 16 bytes, which probably explains > >> the failure. > >> See my question below in the patch: > >> > >> On 03/12/2017 09:20 PM, Parav Pandit wrote: > >>> Hi Ursula, > >>> > >>>> -----Original Message----- > >>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- > >>>> owner@xxxxxxxxxxxxxxx] On Behalf Of Ursula Braun > >>>> Sent: Thursday, March 9, 2017 3:54 AM > >>>> To: Eli Cohen <eli@xxxxxxxxxxxx>; Matan Barak > <matanb@xxxxxxxxxxxx> > >>>> Cc: Saeed Mahameed <saeedm@xxxxxxxxxxxx>; Leon Romanovsky > >>>> <leonro@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx > >>>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > >>>> > >>>> > >>>> > >>>> On 03/06/2017 02:08 PM, Eli Cohen wrote: > >>>>>>> > >>>>>>> The problem seems to be caused by the usage of plain memcpy in > >>>> set_data_inl_seg(). > >>>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an > >>>>>>> address belonging to an area mapped with the > ib_dma_map_single() > >>>>>>> call. On s390x those kind of addresses require extra access > >>>>>>> functions (see > >>>> arch/s390/include/asm/io.h). > >>>>>>> > >>>>> > >>>>> By definition, when you are posting a send request with inline, > >>>>> the address > >>>> must be mapped to the cpu so plain memcpy should work. > >>>>> > >>>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does > >>>> not seem to contain extra coding for IB_SEND_INLINE flag for > >>>> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards > >>>> the IB_SEND_INLINE flag is ignored, and thus I needed the > >>>> ib_dma_map_single() call for the area used with ib_post_send()? > >>>> Does this mean I should stay away from the IB_SEND_INLINE flag, if > >>>> I want to run the same SMC-R code with both, Connect X3 cards and > >>>> Connect X4 > >> cards? > >>>> > >>> I had encountered the same kernel panic that you mentioned last week > >>> on > >> ConnectX-4 adapters with smc-r on x86_64. > >>> Shall I submit below fix to netdev mailing list? > >>> I have tested above change. I also have optimization that avoids dma > >>> mapping > >> for wr_tx_dma_addr. > >>> > >>> - lnk->wr_tx_sges[i].addr = > >>> - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; > >>> + lnk->wr_tx_sges[i].addr = > >>> + (uintptr_t)(lnk->wr_tx_bufs > >>> + + i); > >>> > >>> I also have fix for processing IB_SEND_INLINE in mlx4 driver on > >>> little older > >> kernel base. > >>> I have attached below. I can rebase my kernel and provide fix in mlx5_ib > driver. > >>> Let me know. > >>> > >>> Regards, > >>> Parav Pandit > >>> > >>> diff --git a/drivers/infiniband/hw/mlx4/qp.c > >>> b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644 > >>> --- a/drivers/infiniband/hw/mlx4/qp.c > >>> +++ b/drivers/infiniband/hw/mlx4/qp.c > >>> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, > >>> struct > >> ib_send_wr *wr, > >>> unsigned long flags; > >>> int nreq; > >>> int err = 0; > >>> + int inl = 0; > >>> unsigned ind; > >>> int uninitialized_var(stamp); > >>> int uninitialized_var(size); > >>> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, > >>> struct > >> ib_send_wr *wr, > >>> default: > >>> break; > >>> } > >>> + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) { > >>> + struct mlx4_wqe_inline_seg *seg; > >>> + void *addr; > >>> + int len, seg_len; > >>> + int num_seg; > >>> + int off, to_copy; > >>> + > >>> + inl = 0; > >>> + > >>> + seg = wqe; > >>> + wqe += sizeof *seg; > >>> + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1); > >>> + num_seg = 0; > >>> + seg_len = 0; > >>> + > >>> + for (i = 0; i < wr->num_sge; ++i) { > >>> + addr = (void *) (uintptr_t) wr->sg_list[i].addr; > >>> + len = wr->sg_list[i].length; > >>> + inl += len; > >>> + > >>> + if (inl > 16) { > >>> + inl = 0; > >>> + err = ENOMEM; > >>> + *bad_wr = wr; > >>> + goto out; > >>> + } > >> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for > >> IB_SEND_INLINE data? > >> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the > >> function does not seem to return an error. > >>> > > This check should be for max_inline_data variable of the QP. > > This was just for error check, I should have fixed it. I was testing with nvme > where inline data was only worth 16 bytes. > > I will fix this. Is it possible to change to 44 and do quick test? > > Final patch will have right check in addition to check in create_qp? > > > >>> - /* > >>> - * Write data segments in reverse order, so as to > >>> - * overwrite cacheline stamp last within each > >>> - * cacheline. This avoids issues with WQE > >>> - * prefetching. > >>> - */ > >>> + while (len >= MLX4_INLINE_ALIGN - off) { > With this code there are 2 memcpy-Calls, one with to_copy=44, and the next > one with len 0. > I suggest to change the check to "len > MLX4_INLINE_ALIGN - off". > >>> + to_copy = MLX4_INLINE_ALIGN - off; > >>> + memcpy(wqe, addr, to_copy); > >>> + len -= to_copy; > >>> + wqe += to_copy; > >>> + addr += to_copy; > >>> + seg_len += to_copy; > >>> + wmb(); /* see comment below */ > >>> + seg->byte_count = > >> htonl(MLX4_INLINE_SEG | seg_len); > >>> + seg_len = 0; > >>> + seg = wqe; > >>> + wqe += sizeof *seg; > >>> + off = sizeof *seg; > >>> + ++num_seg; > >>> + } > >>> > >>> - dseg = wqe; > >>> - dseg += wr->num_sge - 1; > >>> - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / > >> 16); > >>> + memcpy(wqe, addr, len); > >>> + wqe += len; > >>> + seg_len += len; > >>> + off += len; > >>> + } > >>> > >>> - /* Add one more inline data segment for ICRC for MLX sends > */ > >>> - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || > >>> - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || > >>> - qp->mlx4_ib_qp_type & > >>> - (MLX4_IB_QPT_PROXY_SMI_OWNER | > >> MLX4_IB_QPT_TUN_SMI_OWNER))) { > >>> - set_mlx_icrc_seg(dseg + 1); > >>> - size += sizeof (struct mlx4_wqe_data_seg) / 16; > >>> - } > >>> + if (seg_len) { > >>> + ++num_seg; > >>> + /* > >>> + * Need a barrier here to make sure > >>> + * all the data is visible before the > >>> + * byte_count field is set. Otherwise > >>> + * the HCA prefetcher could grab the > >>> + * 64-byte chunk with this inline > >>> + * segment and get a valid (!= > >>> + * 0xffffffff) byte count but stale > >>> + * data, and end up sending the wrong > >>> + * data. > >>> + */ > >>> + wmb(); > >>> + seg->byte_count = htonl(MLX4_INLINE_SEG > | > >> seg_len); > >>> + } > >>> > >>> - for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > >>> - set_data_seg(dseg, wr->sg_list + i); > >>> + size += (inl + num_seg * sizeof (*seg) + 15) / 16; > >>> + } else { > >>> + /* > >>> + * Write data segments in reverse order, so as to > >>> + * overwrite cacheline stamp last within each > >>> + * cacheline. This avoids issues with WQE > >>> + * prefetching. > >>> + */ > >>> + > >>> + dseg = wqe; > >>> + dseg += wr->num_sge - 1; > >>> + size += wr->num_sge * (sizeof (struct > >> mlx4_wqe_data_seg) / 16); > >>> + > >>> + /* Add one more inline data segment for ICRC for > MLX > >> sends */ > >>> + if (unlikely(qp->mlx4_ib_qp_type == > MLX4_IB_QPT_SMI > >> || > >>> + qp->mlx4_ib_qp_type == > MLX4_IB_QPT_GSI > >> || > >>> + qp->mlx4_ib_qp_type & > >>> + (MLX4_IB_QPT_PROXY_SMI_OWNER | > >> MLX4_IB_QPT_TUN_SMI_OWNER))) { > >>> + set_mlx_icrc_seg(dseg + 1); > >>> + size += sizeof (struct mlx4_wqe_data_seg) / > 16; > >>> + } > >>> > >>> + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > >>> + set_data_seg(dseg, wr->sg_list + i); > >>> + } > >>> /* > >>> * Possibly overwrite stamping in cacheline with LSO > >>> * segment only after making sure all data segments > >>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More > >>>> majordomo info at http://vger.kernel.org/majordomo-info.html > > ��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f