Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patch added to -mm tree

Oleg Nesterov <oleg@xxxxxxxxxx> · Sat, 23 Feb 2008 22:36:20 +0300

On 02/23, Linus Torvalds wrote:
>
> On Sat, 23 Feb 2008, Oleg Nesterov wrote:
> > 
> > Yes, but still I suspect wmb() is not enough. Note that try_to_wake_up()
> > first checks (reads) the task->state,
> > 
> > 	if (!(old_state & state))
> > 		goto out;
> > 
> > without the full mb() it is (in theory) possible that try_to_wake_up()
> > first reads TASK_RUNNING and only then sets CONDITION. IOW, STORE and
> > LOAD could be re-ordered.
> 
> No. The spinlock can have preceding stores (and loads, for that matter) 
> percolate *into* the locked region, but a spinlock can *not* have loads 
> (and stores) escape *out* of the region withou being totally broken. 
> 
> So the spinlock that predeces that load of "old_state" absolutely *must* 
> be a memory barrier as far as that load is concerned.
> 
> Spinlocks are just "one-way" permeable. They stop both reads and writes, 
> but they stop them from escaping up to earlier code (and the spin-unlock 
> in turn stops reads and writes within the critical section from escaping 
> down past the unlock).
> 
> But they definitely must stop that direction, and no loads or stores that 
> are inside the critical section can possibly be allowed to be done outside 
> of it, or the spinlock would be pointless.

Yes sure. But I meant that the STORE (set CONDITION) can leak into the critical
section, and it could be re-ordered with the LOAD (check ->state) _inside_
the critical section.

However,

> So I think a simple smp_wmb() should be ok.

now I think you are probably right. Because (please ack/nack my understanding)
this smp_wmb() ensures that the preceding STORE can't actually go into
the locked region.

> That said, I do not *mind* the notion of "smp_mb_before/after_spinlock()", 
> and just have architectures do whatever they want (with x86 just being a 
> no-op). I don't think it's wrong in any way, and may be the really 
> generic solution for some theoretical case where locking is done not by 
> using the normal cache coherency, but over a separate lock network. But I 
> would suggest against adding new abstractions unless there are real cases 
> of it mattering.
> 
> (The biggest reason to do it in practice is probably architectures that 
> have a really slow smp_wmb() and that also have a spinlock that is already 
> serializing enough, but if this is only for one single use - in 
> try_to_wake_up() - even that doesn't really seem to be a very strong 
> argument).

Great.

> > A bit offtopic, but let's take another example, __queue_work()->insert_work().
> > With some trivial modification we can move set_wq_data() outside of cwq->lock.
> > But according to linux's memory model we still need wmb(), which is a bit
> > annoying. Perhaps we can also add smp_wmb_before_spinlock(). Not sure this
> > is not too ugly though.
> 
> Is that really so performance-critical? We still need the spinlock for the 
> actual list manipulation, so why would we care?
> 
> And the smp_wmb() really should be free. It's literally free on x86, but 
> doing a write barrier is generally a cheap operation on any sane 
> architecture (ie generally cheaper than a read barrier or a full barrier: 
> it should mostly be a matter of inserting a magic entry in the write 
> buffers).

Yes, yes, this was just a "on a related note", thanks!

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html