Re: [PATCH rdma-core 07/14] mlx4: Update to use new udma write barriers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/7/2017 9:18 PM, Jason Gunthorpe wrote:
+static inline void mmio_wc_spinunlock(pthread_spinlock_t *lock)
+{
+	/* On x86 the lock is enough for strong ordering, but the SFENCE
+	 * encourages the WC buffers to flush out more quickly (Yishai:
+	 * confirm?) */

This macro can't do both and should be dropped, see above.

I don't understand this comment? Why can't it do both?

This macro made the flush and just later made the spin_unlock without letting any command to be executed between them, logically there is no reason for such a limitation.

As of that any command that needs the lock must be done before the flush and delays the hardware from see the BF data immediately.

Specifically, in both mlx4 and mlx5 the original code was to flush just after the BF copy then under the spinlock toggle the bf_offset (i.e. ctx->bf_offset ^= ctx->bf_buf_size) and finally unlock.

That's why this macro is bad, I'll drop this macro and preserve the original logic with the other new macros.

Similar logical issue might be with the mmio_wc_spinlock macro in case the driver wants to make some command after the lock before the start_mmio, currently there is no such use case so basically I'm fine with staying with it.

diff --git a/providers/mlx4/qp.c b/providers/mlx4/qp.c
index 77a4a34576cb69..a22fca7c6f1360 100644
--- a/providers/mlx4/qp.c
+++ b/providers/mlx4/qp.c
@@ -477,23 +477,20 @@ out:
 		ctrl->owner_opcode |= htonl((qp->sq.head & 0xffff) << 8);

 		ctrl->bf_qpn |= qp->doorbell_qpn;
+		++qp->sq.head;
+

This is a change comparing the original code where was wmb() before that line to enforce the compiler not to change the order so that ctrl->owner_opcode will get the correct value of sq.head. Will add here udma_to_device_barrier() which logically expects to do the same.

 		/*
 		 * Make sure that descriptor is written to memory
 		 * before writing to BlueFlame page.
 		 */
-		mmio_wc_start();
-
-		++qp->sq.head;
-
-		pthread_spin_lock(&ctx->bf_lock);
+		mmio_wc_spinlock(&ctx->bf_lock);

 		mlx4_bf_copy(ctx->bf_page + ctx->bf_offset, (unsigned long *) ctrl,
 			     align(size * 16, 64));
-		mmio_flush_writes();

 		ctx->bf_offset ^= ctx->bf_buf_size;

-		pthread_spin_unlock(&ctx->bf_lock);
+		mmio_wc_spinunlock(&ctx->bf_lock);
 	} else if (nreq) {
 		qp->sq.head += nreq;

diff --git a/providers/mlx5/qp.c b/providers/mlx5/qp.c
index d7087d986ce79f..0f1ec0ef2b094b 100644
--- a/providers/mlx5/qp.c
+++ b/providers/mlx5/qp.c
@@ -931,11 +931,11 @@ out:

 		/* Make sure that the doorbell write happens before the memcpy
 		 * to WC memory below */
-		mmio_wc_start();
-
 		ctx = to_mctx(ibqp->context);
-		if (bf->need_lock)
-			mlx5_spin_lock(&bf->lock);
+		if (bf->need_lock && !mlx5_single_threaded)

Can consider some pre-patch which will encapsulate the mlx5_single_threaded flag into the bf->need lock, it will prevent the extra if statement that this new code introduced in the data path.

+			mmio_wc_spinlock(&bf->lock.lock);
+		else
+			mmio_wc_start();

 		if (!ctx->shut_up_bf && nreq == 1 && bf->uuarn &&
 		    (inl || ctx->prefer_bf) && size > 1 &&
@@ -955,10 +955,11 @@ out:
 		 * the mmio_flush_writes is CPU local, this will result in the HCA seeing
 		 * doorbell 2, followed by doorbell 1.
 		 */
-		mmio_flush_writes();
 		bf->offset ^= bf->buf_size;
-		if (bf->need_lock)
-			mlx5_spin_unlock(&bf->lock);
+		if (bf->need_lock && !mlx5_single_threaded)

Same as of above.
+			mmio_wc_spinunlock(&bf->lock.lock);
+		else
+			mmio_flush_writes();
 	}

+static inline void mmio_wc_spinunlock(pthread_spinlock_t *lock)
+{
+	/* On x86 the lock is enough for strong ordering, but the SFENCE
+	 * encourages the WC buffers to flush out more quickly (Yishai:
+	 * confirm?) */
+	mmio_flush_writes();
+	pthread_spin_unlock(lock);

See above comment on that.
+}
+
 #endif


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux