On Tue, 2017-03-14 at 14:06 +0200, Yishai Hadas wrote: > On 3/13/2017 7:00 PM, Jason Gunthorpe wrote: > > > > On Mon, Mar 13, 2017 at 04:53:47PM +0200, Yishai Hadas wrote: > > > > > > From: Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> > > > > > > For x86 the serialization within the spin lock is enough to > > > strongly order WC and other memory types. > > > > > > Add a new barrier named 'mmio_wc_spinlock' to optimize > > > that. > > > > Please use this patch with the commentary instead: > > OK, pull request was updated with below. > https://github.com/linux-rdma/rdma-core/pull/95 Thanks, I've merged this pull request. > > > > > diff --git a/util/udma_barrier.h b/util/udma_barrier.h > > index 9e73148af8d5b6..cfe0459d7f6fff 100644 > > --- a/util/udma_barrier.h > > +++ b/util/udma_barrier.h > > @@ -33,6 +33,8 @@ > > #ifndef __UTIL_UDMA_BARRIER_H > > #define __UTIL_UDMA_BARRIER_H > > > > +#include <pthread.h> > > + > > /* Barriers for DMA. > > > > These barriers are expliclty only for use with user DMA > > operations. If you > > @@ -222,4 +224,37 @@ > > */ > > #define mmio_ordered_writes_hack() mmio_flush_writes() > > > > +/* Write Combining Spinlock primitive > > + > > + Any access to a multi-value WC region must ensure that multiple > > cpus do not > > + write to the same values concurrently, these macros make that > > + straightforward and efficient if the choosen exclusion is a > > spinlock. > > + > > + The spinlock guarantees that the WC writes issued within the > > critical > > + section are made visible as TLP to the device. The TLP must be > > seen by the > > + device strictly in the order that the spinlocks are acquired, > > and combining > > + WC writes between different sections is not permitted. > > + > > + Use of these macros allow the fencing inside the spinlock to be > > combined > > + with the fencing required for DMA. > > + */ > > +static inline void mmio_wc_spinlock(pthread_spinlock_t *lock) > > +{ > > + pthread_spin_lock(lock); > > +#if !defined(__i386__) && !defined(__x86_64__) > > + /* For x86 the serialization within the spin lock is > > enough to > > + * strongly order WC and other memory types. */ > > + mmio_wc_start(); > > +#endif > > +} > > + > > +static inline void mmio_wc_spinunlock(pthread_spinlock_t *lock) > > +{ > > + /* It is possible that on x86 the atomic in the lock is > > strong enough > > + * to force-flush the WC buffers quickly, and this SFENCE > > can be > > + * omitted too. */ > > + mmio_flush_writes(); > > + pthread_spin_unlock(lock); > > +} > > + > > #endif > -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html