On 3/9/2017 7:03 PM, Jason Gunthorpe wrote:
I honestly think you are trying far too much to pointlessly preserve
the exact original code...
At that stage we would like to solve the degradation that was introduced
in previous series of the barriers. Further improvements will be as some
incremental patches after making the required performance testing.
If you are going to send this a patch please include my updated
comment.
I put some comment in mlx4/5 that points that code is latency oriented
and flush immediately.
+/* Do mmio_wc_start and grab a spinlock */
+static inline void mmio_wc_spinlock(pthread_spinlock_t *lock)
+{
+ pthread_spin_lock(lock);
+#if !defined(__i386__) && !defined(__x86_64__)
+ /* For x86 the serialization within the spin lock is enough to
+ * strongly order WC and other memory types. */
+ mmio_wc_start();
+#endif
I would like to see the unlock inline still present in the header for
clarity to the reader what the expected pattern is, and a comment in
mlx4/mlx5 indicating they are not using the unlock macro directly to
try and reduce latency to the flush.
For now this macro is not in use by mlx4 & mlx5 as pointed before.
In addition, it still needs some work to verify whether the unlock is
fully equal to flushing the data immediately as you also wondered.
The lock macro is used for ordering and as such can be used based on
Intel docs that were pointed.
In case there will be some provider that needs this optimization it can
be added after finalizing above work.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html