On Tue, Apr 12, 2011 at 2:08 PM, NeilBrown <neilb@xxxxxxx> wrote: > On Wed, 13 Apr 2011 00:34:52 +1000 Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> >> Well, not really - now taking any sleeping lock or waiting on >> anything can trigger a plug flush where previously you had to >> explicitly issue them. I'm not saying what we had is better, just >> that there are implicit flushes with your changes that are >> inherently uncontrollable... > > It's not just sleeping locks - if preempt is enabled a schedule can happen at > any time - at any depth. I've seen a spin_unlock do it. Hmm. I don't think we should flush IO in the preemption path. That smells wrong on many levels, just one of them being the "any time, any depth". It also sounds really wrong from an IO pattern standpoint. The process is actually still running, and the IO flushing _already_ does the "only if it's going to sleep" test, but it actually does it _wrong_. The "current->state" check doesn't make sense for a preemption event, because it's not actually going to sleep there. So a patch like the attached (UNTESTED!) sounds like the right thing to do. Whether it makes any difference for any MD issues, who knows.. But considering that the unplugging already used to test for "prev->state != TASK_RUNNING", this is absolutely the right thing to do - that old test was just broken. Linus
kernel/sched.c | 20 ++++++++++---------- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 48013633d792..a187c3fe027b 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4111,20 +4111,20 @@ need_resched: try_to_wake_up_local(to_wakeup); } deactivate_task(rq, prev, DEQUEUE_SLEEP); + + /* + * If we are going to sleep and we have plugged IO queued, make + * sure to submit it to avoid deadlocks. + */ + if (blk_needs_flush_plug(prev)) { + raw_spin_unlock(&rq->lock); + blk_flush_plug(prev); + raw_spin_lock(&rq->lock); + } } switch_count = &prev->nvcsw; } - /* - * If we are going to sleep and we have plugged IO queued, make - * sure to submit it to avoid deadlocks. - */ - if (prev->state != TASK_RUNNING && blk_needs_flush_plug(prev)) { - raw_spin_unlock(&rq->lock); - blk_flush_plug(prev); - raw_spin_lock(&rq->lock); - } - pre_schedule(rq, prev); if (unlikely(!rq->nr_running))