Re: RAID1 might_sleep() warning on 3.19-rc7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 10 Feb 2015 10:29:36 +0100 Peter Zijlstra <peterz@xxxxxxxxxxxxx>
wrote:

> On Tue, Feb 10, 2015 at 01:50:17PM +1100, NeilBrown wrote:
> > On Mon, 9 Feb 2015 10:10:00 +0100 Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > > However, when io_schedule() explicitly calls blk_flush_plug(), then
> > > > @from_schedule=false variant is used, and the unplug functions are allowed to
> > > > allocate memory and block and maybe even call mempool_alloc() which might
> > > > call io_schedule().
> > > > 
> > > > This shouldn't be a problem as blk_flush_plug() spliced out the plug list, so
> > > > any recursive call will find an empty list and do nothing.
> > > 
> > > Unless, something along the way stuck something back on, right? So
> > > should we stick an:
> > > 
> > > 	WARN_ON(current->in_iowait);
> > > 
> > > somewhere near where things are added to this plug list? (and move the
> > > blk_flush_plug() call inside of where that's actually true of course).
> > 
> > No, I don't think so.
> > 
> > It is certainly possible that some request on plug->cb_list could add
> > something to plug->list - which is processed after ->cb_list.
> > 
> > I think the best way to think about this is that the *problem* was that a
> > wait_event loop could spin without making any progress.   So any time that
> > clear forward progress is made it is safe sleep without necessitating the
> > warning.  Hence sched_annotate_sleep() is reasonable.
> > blk_flush_plug() with definitely have dispatched some requests if it
> > might_sleep(), so the sleep is OK.
> 
> Well, yes, but you forget that this gets us back into recursion land.
> io_schedule() calling io_schedule() calling io_schedule() and *boom*
> stack overflow -> dead machine.
> 
> We must either guarantee io_schedule() will never call io_schedule() or
> that io_schedule() itself will not add new work to the current plug such
> that calling io_schedule() itself will not recurse on the blk stuff.
> 
> Pick either option, but pick one.

I choose ... Buzz Lightyear !!!

Sorry, go carried away there.  Uhhmm.  I think I pick a/  (But I expect I'll
find a goat... ho hum).

Does this look credible?

Thanks,
NeilBrown


From: NeilBrown <neilb@xxxxxxx>
Date: Fri, 13 Feb 2015 15:49:17 +1100
Subject: [PATCH] sched: prevent recursion in io_schedule()

io_schedule() calls blk_flush_plug() which, depending on the
contents of current->plug, can initiate arbitrary blk-io requests.

Note that this contrasts with blk_schedule_flush_plug() which requires
all non-trivial work to be handed off to a separate thread.

This makes it possible for io_schedule() to recurse, and initiating
block requests could possibly call mempool_alloc() which, in times of
memory pressure, uses io_schedule().

Apart from any stack usage issues, io_schedule() will not behave
correctly when called recursively as delayacct_blkio_start() does
not allow for repeated calls.

So:
 - use in_iowait to detect recursion.  Set it earlier, and restore
   it to the old value.
 - move the call to "raw_rq" after the call to blk_flush_plug().
   As this is some sort of per-cpu thing, we want some chance that
   we are on the right CPU
 - When io_schedule() is called recurively, use blk_schedule_flush_plug()
   which cannot further recurse.
 - as this makes io_schedule() a lot more complex and as io_schedule()
   must match io_schedule_timeout(), but all the changes in io_schedule_timeout()
   and make io_schedule a simple wrapper for that.

Signed-off-by: NeilBrown <neilb@xxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f37fe7f77a4..90f3de8bc7ca 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4420,30 +4420,27 @@ EXPORT_SYMBOL_GPL(yield_to);
  */
 void __sched io_schedule(void)
 {
-	struct rq *rq = raw_rq();
-
-	delayacct_blkio_start();
-	atomic_inc(&rq->nr_iowait);
-	blk_flush_plug(current);
-	current->in_iowait = 1;
-	schedule();
-	current->in_iowait = 0;
-	atomic_dec(&rq->nr_iowait);
-	delayacct_blkio_end();
+	io_schedule_timeout(MAX_SCHEDULE_TIMEOUT);
 }
 EXPORT_SYMBOL(io_schedule);
 
 long __sched io_schedule_timeout(long timeout)
 {
-	struct rq *rq = raw_rq();
+	struct rq *rq;
 	long ret;
+	int old_iowait = current->in_iowait;
+
+	current->in_iowait = 1;
+	if (old_iowait)
+		blk_schedule_flush_plug(current);
+	else
+		blk_flush_plug(current);
 
 	delayacct_blkio_start();
+	rq = raw_rq();
 	atomic_inc(&rq->nr_iowait);
-	blk_flush_plug(current);
-	current->in_iowait = 1;
 	ret = schedule_timeout(timeout);
-	current->in_iowait = 0;
+	current->in_iowait = old_iowait;
 	atomic_dec(&rq->nr_iowait);
 	delayacct_blkio_end();
 	return ret;

Attachment: pgpn8j2rversJ.pgp
Description: OpenPGP digital signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux