On Thu, 2010-01-21 at 11:18 -0800, David Daney wrote: > Steven Rostedt wrote: > > Peter Zijlstra and I were doing a look over of places that assign > > current->state = TASK_*INTERRUPTIBLE, by simply looking at places with: > > > > $ git grep -A1 'state[[:space:]]*=[[:space:]]*TASK_[^R]' > > > > and it seems there are quite a few places that looks like bugs. To be on > > the safe side, everything outside of a run queue lock that sets the > > current state to something other than TASK_RUNNING (or dead) should be > > using set_current_state(). > > > > current->state = TASK_INTERRUPTIBLE; > > schedule(); > > > > is probably OK, but it would not hurt to be consistent. Here's a few > > examples of likely bugs: > > > [...] > > This may be a bit off topic, but exactly which type of barrier should > set_current_state() be implying? > > On MIPS, set_mb() (which is used by set_current_state()) has a full mb(). > > Some MIPS based processors have a much lighter weight wmb(). Could > wmb() be used in place of mb() here? Nope, wmb() is not enough. Below is an explanation. > > If not, an explanation of the required memory ordering semantics here > would be appreciated. > > I know the documentation says: > > set_current_state() includes a barrier so that the write of > current->state is correctly serialised wrt the caller's subsequent > test of whether to actually sleep: > > set_current_state(TASK_UNINTERRUPTIBLE); > if (do_i_need_to_sleep()) > schedule(); > > > Since the current CPU sees the memory accesses in order, what can be > happening on other CPUs that would require a full mb()? Lets look at a hypothetical situation with: add_wait_queue(); current->state = TASK_UNINTERRUPTIBLE; smp_wmb(); if (!x) schedule(); Then somewhere we probably have: x = 1; smp_wmb(); wake_up(queue); CPU 0 CPU 1 ------------ ----------- add_wait_queue(); (cpu pipeline sees a load of x ahead, and preloads it) x = 1; smp_wmb(); wake_up(queue); (task on CPU 0 is still at TASK_RUNNING); current->state = TASK_INTERRUPTIBLE; smp_wmb(); <<-- does not prevent early loading of x if (!x) <<-- returns true schedule(); Now the task on CPU 0 missed the wake up. Note, places that call schedule() are not fast paths, and probably not called often. Adding the overhead of smp_mb() to ensure correctness is a small price to pay compared to search for why you have a stuck task that was never woken up. Read Documentation/memory-barriers.txt, it will be worth the time you spend doing so. -- Steve