On 09/10/2014 05:48 PM, James Bottomley wrote: > On Tue, 2014-09-09 at 06:40 -0400, Peter Hurley wrote: >> On 09/08/2014 10:56 PM, James Bottomley wrote: >>> On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote: >>>> On 09/08/2014 01:50 AM, James Bottomley wrote: >>>>>> But additionally, even if gcc combines adjacent writes _that are part >>>>>> of the program flow_ then I believe the situation is no worse than >>>>>> would otherwise exist. >>>>>> >>>>>> For instance, given the following: >>>>>> >>>>>> struct x { >>>>>> spinlock_t lock; >>>>>> long a; >>>>>> byte b; >>>>>> byte c; >>>>>> }; >>>>>> >>>>>> void locked_store_b(struct x *p) >>>>>> { >>>>>> spin_lock(&p->lock); >>>>>> p->b = 1; >>>>>> spin_unlock(&p->lock); >>>>>> p->c = 2; >>>>>> } >>>>>> >>>>>> Granted, the author probably expects ordered writes of >>>>>> STORE B >>>>>> STORE C >>>>>> but that's not guaranteed because there is no memory barrier >>>>>> ordering B before C. >>>>> >>>>> Yes, there is: loads and stores may not migrate into or out of critical >>>>> sections. >>>> >>>> That's a common misconception. >>>> >>>> The processor is free to re-order this to: >>>> >>>> STORE C >>>> STORE B >>>> UNLOCK >>>> >>>> That's because the unlock() only guarantees that: >>>> >>>> Stores before the unlock in program order are guaranteed to complete >>>> before the unlock completes. Stores after the unlock _may_ complete >>>> before the unlock completes. >>>> >>>> My point was that even if compiler barriers had the same semantics >>>> as memory barriers, the situation would be no worse. That is, code >>>> that is sensitive to memory barriers (like the example I gave above) >>>> would merely have the same fragility with one-way compiler barriers >>>> (with respect to the compiler combining writes). >>>> >>>> That's what I meant by "no worse than would otherwise exist". >>> >>> Actually, that's not correct. This is actually deja vu with me on the >>> other side of the argument. When we first did spinlocks on PA, I argued >>> as you did: lock only a barrier for code after and unlock for code >>> before. The failing case is that you can have a critical section which >>> performs an atomically required operation and a following unit which >>> depends on it being performed. If you begin the following unit before >>> the atomic requirement, you may end up losing. It turns out this kind >>> of pattern is inherent in a lot of mail box device drivers: you need to >>> set up the mailbox atomically then poke it. Setup is usually atomic, >>> deciding which mailbox to prime and actually poking it is in the >>> following unit. Priming often involves an I/O bus transaction and if >>> you poke before priming, you get a misfire. >> >> Take it up with the man because this was discussed extensively last >> year and it was decided that unlocks would not be full barriers. >> Thus the changes to memory-barriers.txt that explicitly note this >> and the addition of smp_mb__after_unlock_lock() (for two different >> locks; an unlock followed by a lock on the same lock is a full barrier). >> >> Code that expects ordered writes after an unlock needs to explicitly >> add the memory barrier. > > I don't really care what ARM does; spin locks are full barriers on > architectures that need them. The driver problem we had that detected > our semi permeable spinlocks was an LSI 53c875 which is enterprise class > PCI, so presumably not relevant to ARM anyway. Almost certainly ia64 arch_spin_unlock() is not a full barrier. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html