On 2011-04-13 13:12, Peter Zijlstra wrote: > On Tue, 2011-04-12 at 19:23 -0700, Linus Torvalds wrote: >> kernel/sched.c | 20 ++++++++++---------- >> 1 files changed, 10 insertions(+), 10 deletions(-) >> >> diff --git a/kernel/sched.c b/kernel/sched.c >> index 48013633d792..a187c3fe027b 100644 >> --- a/kernel/sched.c >> +++ b/kernel/sched.c >> @@ -4111,20 +4111,20 @@ need_resched: >> try_to_wake_up_local(to_wakeup); >> } >> deactivate_task(rq, prev, DEQUEUE_SLEEP); >> + >> + /* >> + * If we are going to sleep and we have plugged IO queued, make >> + * sure to submit it to avoid deadlocks. >> + */ >> + if (blk_needs_flush_plug(prev)) { >> + raw_spin_unlock(&rq->lock); >> + blk_flush_plug(prev); >> + raw_spin_lock(&rq->lock); >> + } >> } >> switch_count = &prev->nvcsw; >> } >> >> - /* >> - * If we are going to sleep and we have plugged IO queued, make >> - * sure to submit it to avoid deadlocks. >> - */ >> - if (prev->state != TASK_RUNNING && blk_needs_flush_plug(prev)) { >> - raw_spin_unlock(&rq->lock); >> - blk_flush_plug(prev); >> - raw_spin_lock(&rq->lock); >> - } >> - >> pre_schedule(rq, prev); >> >> if (unlikely(!rq->nr_running)) > > Right, that cures the preemption problem. The reason I suggested placing > it where it was is that I'd like to keep all things that release > rq->lock in the middle of schedule() in one place, but I guess we can > cure that with some extra comments. We definitely only want to do it on going to sleep, not preempt events. So if you are fine with this change, then lets please do that. Linus, I've got a few other things queued up in the area, I'll add this and send them off soon. Or feel free to add this one yourself, since you already did it. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html