On 2/27/2017 8:00 PM, Jason Gunthorpe wrote:
On Mon, Feb 27, 2017 at 12:56:33PM +0200, Yishai Hadas wrote:
On 2/16/2017 9:23 PM, Jason Gunthorpe wrote:
The mlx5 comments are good so these translate fairly directly.
There is one barrier in mlx5_arm_cq that I could not explain, it became
mmio_ordered_writes_hack()
Signed-off-by: Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>
#ifdef MLX5_DEBUG
{
@@ -1283,14 +1283,14 @@ int mlx5_arm_cq(struct ibv_cq *ibvcq, int solicited)
* Make sure that the doorbell record in host memory is
* written before ringing the doorbell via PCI MMIO.
*/
- wmb();
+ udma_to_device_barrier();
doorbell[0] = htonl(sn << 28 | cmd | ci);
doorbell[1] = htonl(cq->cqn);
mlx5_write64(doorbell, ctx->uar[0] + MLX5_CQ_DOORBELL, &ctx->lock32);
- wc_wmb();
+ mmio_ordered_writes_hack();
We expect to use here the "mmio_flush_writes()" new macro, instead of the
above "hack" one. This barrier enforces the data to be flushed immediately
to the device so that the CQ will be armed with no delay.
Hmm.....
Is it even possible to 'speed up' writes to UC memory? (uar is
UC, right?)
No, the UAR is mapped write combing, that's why we need here the
mmio_flush_writes() to make sure that the device will see the data with
no delay.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html