Will Deacon's on February 23, 2019 4:50 am: > The mmiowb() macro is horribly difficult to use and drivers will continue > to work most of the time if they omit a call when it is required. > > Rather than rely on driver authors getting this right, push mmiowb() into > arch_spin_unlock() for ia64. If this is deemed to be a performance issue, > a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide > the barrier in cases where no I/O writes were performned inside the > critical section. mmiowb() was always the wrong approach. IIRC what happened is that an ia64 platform found that real wmb() semantics were too expensive, so they kind of "relaxed" it, breaking everything, and then said drivers that wanted to unbreak themselves had to add these mmiowb() in. The right way to go of course would have been to implement wmb() the way existing drivers expected, and add a faster io_wmb() that only ordered mmio stores from the CPU added to the few drivers that the platform cared about. I think it was argued the wmb() was still technically correct because the reordering did not happen at the CPU, but somewhere else in the interconnect or PCI controller. But that was just a crazy burden to put on driver writers, and it was why the documentation was always incomprehensible. Not sure why Linus ever went along with it, but awesome you're removing it. Thank you! Thanks, Nick