On Mon, Feb 27, 2017 at 12:56:33PM +0200, Yishai Hadas wrote: > On 2/16/2017 9:23 PM, Jason Gunthorpe wrote: > >The mlx5 comments are good so these translate fairly directly. > > > >There is one barrier in mlx5_arm_cq that I could not explain, it became > >mmio_ordered_writes_hack() > > > >Signed-off-by: Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> > > > #ifdef MLX5_DEBUG > > { > >@@ -1283,14 +1283,14 @@ int mlx5_arm_cq(struct ibv_cq *ibvcq, int solicited) > > * Make sure that the doorbell record in host memory is > > * written before ringing the doorbell via PCI MMIO. > > */ > >- wmb(); > >+ udma_to_device_barrier(); > > > > doorbell[0] = htonl(sn << 28 | cmd | ci); > > doorbell[1] = htonl(cq->cqn); > > > > mlx5_write64(doorbell, ctx->uar[0] + MLX5_CQ_DOORBELL, &ctx->lock32); > > > >- wc_wmb(); > >+ mmio_ordered_writes_hack(); > > We expect to use here the "mmio_flush_writes()" new macro, instead of the > above "hack" one. This barrier enforces the data to be flushed immediately > to the device so that the CQ will be armed with no delay. Hmm..... Is it even possible to 'speed up' writes to UC memory? (uar is UC, right?) Be aware the trade off, a barrier may stall the CPU until the UC writes progress far enough, but that stall is pointless if the barrier also doesn't 'speed up' the write. Also, the usual implementation of mlx5_write64 includes that spinlock which already has a serializing atomic in it - so it is doubtfull that the wc_wmb() actually ever did anything. Do you have any hard information one way or another? IMHO, if there is a way to speed up UC writes then it should have its own macro. Eg mmio_flush_uc_writes(), and it probably should be called within the mlx5_write64 implementation before releasing the spinlock. But, AFAIK, there is no way to do that on x86-64... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html