On Thu, Nov 28, 2013 at 06:53:41PM +0000, Will Deacon wrote: > On Thu, Nov 28, 2013 at 06:27:12PM +0000, Paul E. McKenney wrote: > > On Thu, Nov 28, 2013 at 06:03:18PM +0000, Will Deacon wrote: > > > Hmm, without horrible hacks to keep track of whether we've done an > > > mb__before_spinlock() without a matching spinlock(), that's going to end up > > > with full-barrier + pointless half-barrier (similarly on the release path). > > > > We should be able to detect mb__before_spinlock() without a matching > > spinlock via static analysis, right? > > > > Or am I missing your point? > > See below... > > > > > Yes, we might need better names, but I believe that this approach does > > > > what you need. > > > > > > > > Thoughts? > > > > > > I still think we need to draw the distinction between ordering all accesses > > > against a lock and ordering an unlock against a lock. The latter is free for > > > arm64 (STLR => LDAR is ordered) but the former requires a DMB. > > > > > > Not sure I completely got your drift... > > > > Here is what I am suggesting: > > > > o mb__before_spinlock(): > > > > o Must appear immediately before a lock acquisition. > > o Upgrades a lock acquisition to a full barrier. > > o Emits DMB on ARM64. > > Ok, so that then means that: > > mb__before_spinlock(); > spin_lock(); > > on ARM64 expands to: > > dmb ish > ldaxr ... > > so there's a redundant half-barrier there. If we want to get rid of that, we > need mb__before_spinlock() to set a flag, then we could conditionalise > ldaxr/ldxr but it's really horrible and you have to deal with interrupts > etc. so in reality we just end up having extra barriers. Given that there was just a dmb, how much does the ish &c really hurt? Would the performance difference be measurable at the system level? > Or we have separate a spin_lock_mb() function. And mutex_lock_mb(). And spin_lock_irqsave_mb(). And spin_lock_irq_mb(). And... Admittedly this is not yet a problem given the current very low usage of smp_mb__before_spinlock(), but the potential for API explosion is non-trivial. That said, if the effect on ARM64 is measurable at the system level, I won't stand in the way of the additional APIs. > > o mb_after_spinlock(): > > > > o Must appear immediatly after a lock acquisition. > > o Upgrades an unlock+lock pair to a full barrier. > > o Emits a no-op on ARM64, as in "do { } while (0)". > > o Might need a separate flavor for queued locks on > > some platforms, but no sign of that yet. > > Ok, so mb__after_spinlock() doesn't imply a full barrier but > mb__before_spinlock() does? I think people will get that wrong :) As I said earlier in the thread, I am open to better names. How about smp_mb__after_spin_unlock_lock_pair()? That said, I am sure that I could come up with something longer given enough time. ;-) Thanx, Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>