On Fri, 1 Jul 2005, Alan Cox wrote: > > But that mentions compiler only, not CPU ordering! I understand the BIU > > of the issuing CPU and any external hardware is still permitted to > > merge/reorder these accesses unless separated by wmb()/rmb()/mb() as > > I think the practical situation is that this implies ordering to the bus > interface. It might be interesting to ask the powerpc people their > experience but looking at most PCI drivers they assume this and it would > be expensive not to do so on x86. Hmm, doing this OTOH would be expensive on platforms actually requiring explicit barriers for this to be the case. The problem is only drivers know what they expect, e.g. you may need as much as: writel(); mb(); readl(); but only: readl(); rmb(); readl(); With barriers coded explicitly in drivers, you may control this, with ones inside these mmio functions/macros you need to use mb() everywhere as you don't know what the surrounding operations are going to be. And mb() may be significantly more expensive than rmb(). Of course to facilitate such explicit barriers for platforms where inter-processor ordering rules are different to ones for mmio a different set of operations would have to be defined -- actually we've already got one, mmiowb(), as a starting point. > > We have that iob() macro/call as well, so that you can push cycles out of > > the CPU domain immediately as well, which is equivalent to: > > > mb(); > > make_host_complete_writes(); > > My feeling is the default readb etc are __readb + mb + make_hos... Hmm, barriers are normally expected to happen *before* affected operations, which is natural and often much faster as in the case of traditional MIPS write-back buffers, where there is no "flush" operation and mb() is just a tight loop spinning on the WB condition non-empty, e.g.: "0: bc0f 0b" till the buffer empties itself. So I'd rather make readb() being mb() + make_host_complete_writes() + __readb(). But it would be more painful performance-wise than necessary for many cases, questioning the whole idea as any sane driver writer would prefer to use these double-underscore calls and schedule barriers as necessary manually anyway. But if it's indeed what's intended I'd prefer it to be documented somewhere in a reasonable place as there are people outside the Intel world which may not necessarily know which interfaces imply Intel semantics and which do not. ;-) Maciej