On Thu, Mar 22, 2018 at 3:24 PM, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > On Wed, 2018-03-21 at 08:53 -0500, Sinan Kaya wrote: >> writel_relaxed() needs to have ordering guarantees with respect to the order >> device observes writes. > > Correct. > >> x86 has compiler barrier inside the relaxed() API so that code does not >> get reordered. ARM64 architecturally guarantees device writes to be observed >> in order. >> >> I was hoping that PPC could follow x86 and inject compiler barrier into the >> relaxed functions. >> >> BTW, I have no idea what compiler barrier does on PPC and if >> >> wrltel() == compiler barrier() + wrltel_relaxed() >> >> can be said. > > No, it's not sufficient. > > Replacing wmb() + writel() with wmb() + writel_relaxed() will work on > PPC, it will just not give you a benefit today. > > The main problem is that the semantics of writel/writel_relaxed (and > read versions) aren't very well defined in Linux esp. when it comes > to different memory types (NC, WC, ...). > > I've been wanting to implement the relaxed accessors for a while but > was battling with this to try to also better support WC, and due to > other commitments, this somewhat fell down the cracks. > > Two options I can think of: > > - Just make the _relaxed variants use an eieio instead of a sync, this > will effectively lift the ordering guarantee vs. cachable storage (and > thus unlock) and might give a (small) performance improvement. Wouldn't we still have the unlock ordering due to the io_sync hack or are you thinking we should remove that too for the relaxed version? > However, > we still have the problem that on WC mappings, neither writel nor > writel_relaxed will effectively allow combining to happen (only raw > accesses will because on powerpc *all* barriers will break combining). Hmm, eieio is only architected to affect CI+G (and WT) so it shouldn't affect combining on non-guarded memory. Do most implementations apply it to all CI accesses anyway? > - Make writel_relaxed() be a simple store without barriers, and > readl_relaxed() be "eieio, read, eieio", thus allowing write combining > to happen between successive writel_relaxed on WC space (no change on > normal NC space) while maintaining the ordering between relaxed reads > and writes. The flip side is a (slight) increased overhead of > readl_relaxed. Are there many drivers that actually do writeX() on WC space? memory-barriers.txt pretty much says that all bets are off and no ordering guarantees can be assumed when using readX/writeX on prefetchable IO memory. It seems sketchy enough to give me some pause, but maybe it works fine elsewhere. Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html