>>>>> "Roland" == Roland Dreier <rdreier@xxxxxxxxx> writes: >> This is a different issue. We deal with it on powerpc by having >> writel set a per-cpu flag and spin_unlock() test it, and do the >> barrier if needed there. Roland> Cool... I assume you do this for mutex_unlock() etc? Roland> Is there any reason why ia64 can't do this too so we can kill Roland> mmiowb and save everyone a lot of hassle? (mips, sh and frv Roland> have non-empty mmiowb() definitions too but I'd guess that Roland> these are all bugs based on misunderstandings of the mmiowb() Roland> semantics...) Hi Roland, Thats not going to solve the problem on Altix. On Altix the issue is that there can be multiple paths through the NUMA fabric from cpuX to PCI bridge Y. Consider this uber-cool<tm> ascii art - NR is my abbrevation for NUMA router: ------- ------- |cpu X| |cpu Y| ------- ------- | \____ ____/ | | \/ | | ____/\____ | | / \ | ----- ------ |NR 1| |NR 2| ------ ------ \ / \ / ------- | PCI | ------- The problem is that your two writel's, despite being both issued on cpu X, due to the spin lock, in your example, can end up with the first one going through NR 1 and the second one going through NR 2. If there's contention on NR 1, the write going via NR 2 may hit the PCI bridge prior to the one going via NR 1. Of course, the bigger the system, the worse the problem.... The only way to guarantee ordering in the above setup, is to either make writel() fully ordered or adding the mmiowb()'s inbetween the two writel's. On Altix you have to go and read from the PCI brige to ensure all writes to it have been flushed, which is also what mmiowb() is doing. If writel() was to guarantee this ordering, it would make every writel() call extremely expensive :-( Cheers, Jes -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html