On Fri, Feb 22, 2019 at 1:49 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > The case we want to go fast is the spin-lock and unlock case, not the > "set pending" case. > > And the way you implemented this, it's exactly the wrong way around. Oh, one more comment: couldn't we make that mmiowb flag be right next to the preemption count? Because that's the common case anyway, where a spinlock increments the preemption count too. If we put the mmiowb state in the same cacheline, we don't cause extra cache effects, which is what really matters, I guess. I realize this is somewhat inconvenient, because some architectures put preempt count in the thread structure, and others do it as a percpu variable. But maybe the architecture could just declare where the mmiowb state is? Linus