On Fri, Feb 22, 2019 at 01:55:20PM -0800, Linus Torvalds wrote: > On Fri, Feb 22, 2019 at 1:49 PM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > The case we want to go fast is the spin-lock and unlock case, not the > > "set pending" case. > > > > And the way you implemented this, it's exactly the wrong way around. > > Oh, one more comment: couldn't we make that mmiowb flag be right next > to the preemption count? > > Because that's the common case anyway, where a spinlock increments the > preemption count too. If we put the mmiowb state in the same > cacheline, we don't cause extra cache effects, which is what really > matters, I guess. > > I realize this is somewhat inconvenient, because some architectures > put preempt count in the thread structure, and others do it as a > percpu variable. But maybe the architecture could just declare where > the mmiowb state is? I think that should be doable... I'll have a play. Will