Re: [PATCH rdma-core 07/14] mlx4: Update to use new udma write barriers

Yishai Hadas <yishaih@xxxxxxxxxxxxxxxxxx> · Mon, 13 Mar 2017 17:17:48 +0200

On 3/9/2017 7:03 PM, Jason Gunthorpe wrote:
I honestly think you are trying far too much to pointlessly preserve
the exact original code...

At that stage we would like to solve the degradation that was introduced 
in previous series of the barriers. Further improvements will be as some 
incremental patches after making the required performance testing.

If you are going to send this a patch please include my updated
comment.

I put some comment in mlx4/5 that points that code is latency oriented 
and flush immediately.

+/* Do mmio_wc_start and grab a spinlock */
+static inline void mmio_wc_spinlock(pthread_spinlock_t *lock)
+{
+       pthread_spin_lock(lock);
+#if !defined(__i386__) && !defined(__x86_64__)
+       /* For x86 the serialization within the spin lock is enough to
+        * strongly order WC and other memory types. */
+       mmio_wc_start();
+#endif

I would like to see the unlock inline still present in the header for
clarity to the reader what the expected pattern is, and a comment in
mlx4/mlx5 indicating they are not using the unlock macro directly to
try and reduce latency to the flush.

For now this macro is not in use by mlx4 & mlx5 as pointed before.

In addition, it still needs some work to verify whether the unlock is 
fully equal to flushing the data immediately as you also wondered.
The lock macro is used for ordering and as such can be used based on 
Intel docs that were pointed.

In case there will be some provider that needs this optimization it can 
be added after finalizing above work.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html