On Thu, Apr 17, 2014 at 02:44:03PM +0100, Will Deacon wrote: > Hello, > > This RFC series attempts to define a portable (i.e. cross-architecture) > definition of the {readX,writeX}_relaxed MMIO accessor functions. These > functions are already in widespread use amongst drivers (mainly those supporting > devices embedded in ARM SoCs), but lack any well-defined semantics and, > subsequently, any portable definitions to allow these drivers to be compiled for > other architectures. > > The two main motivations for this series are: > > (1) To promote use of the _relaxed MMIO accessors on weakly-ordered > architectures, where they can bring significant performance improvements > over their non-relaxed counterparts. > > (2) To allow COMPILE_TEST to build drivers using the relaxed accessors across > all architectures. > > The proposed semantics largely match exactly those provided by the ARM > implementation (i.e. no weaker), with one exception (see below). > > Informally: > > - Relaxed accesses to the same device are ordered with respect to each other. > > - Relaxed accesses are *not* guaranteed to be ordered with respect to normal > memory accesses (e.g. DMA buffers -- this is what gives us the performance > boost over the non-relaxed versions). > > - Relaxed accesses are not guaranteed to be ordered with respect to > LOCK/UNLOCK operations. > > In actual fact, the relaxed accessors *are* ordered with respect to LOCK/UNLOCK > operations on ARM[64], but I have added this constraint for the benefit of > PowerPC, which has expensive I/O barriers in the spin_unlock path for the > non-relaxed accessors. > > A corollary to this is that mmiowb() probably needs rethinking. As it currently > stands, an mmiowb() is required to order MMIO writes to a device from multiple > CPUs, even if that device is protected by a lock. However, this isn't often used > in practice, leading to PowerPC implementing both mmiowb() *and* synchronising > I/O in spin_unlock. > > I would propose making the non-relaxed I/O accessors ordered with respect to > LOCK/UNLOCK, leaving mmiowb() to be used with the relaxed accessors, if > required, but would welcome thoughts/suggestions on this topic. So the non-relaxed ops already imply the expensive I/O barrier (mmiowb?) and therefore, PPC can drop it from spin_unlock()? Also, I read mmiowb() as MMIO-write-barrier(), what do we have to order/contain mmio-reads? I have _0_ experience with MMIO, so I've no idea if ordering/containing reads is silly or not. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html