On Nov 29, 2013 8:18 AM, "Will Deacon" <will.deacon@xxxxxxx> wrote:
>
> To get some sort of
> idea, I tried adding a dmb to the start of spin_unlock on ARMv7 and I saw a
> 3% performance hit in hackbench on my dual-cluster board.
Don't do a dmb. Just do a dummy release. You just said that on arm64 a unlock+lock is a memory barrier, so just make the mb__before_spinlock() be a dummy store with release to the stack..
That should be noticeably cheaper than a full dmb.
Linus