On Tue, 10 Feb 2015 10:29:36 +0100 Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Tue, Feb 10, 2015 at 01:50:17PM +1100, NeilBrown wrote: > > On Mon, 9 Feb 2015 10:10:00 +0100 Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > However, when io_schedule() explicitly calls blk_flush_plug(), then > > > > @from_schedule=false variant is used, and the unplug functions are allowed to > > > > allocate memory and block and maybe even call mempool_alloc() which might > > > > call io_schedule(). > > > > > > > > This shouldn't be a problem as blk_flush_plug() spliced out the plug list, so > > > > any recursive call will find an empty list and do nothing. > > > > > > Unless, something along the way stuck something back on, right? So > > > should we stick an: > > > > > > WARN_ON(current->in_iowait); > > > > > > somewhere near where things are added to this plug list? (and move the > > > blk_flush_plug() call inside of where that's actually true of course). > > > > No, I don't think so. > > > > It is certainly possible that some request on plug->cb_list could add > > something to plug->list - which is processed after ->cb_list. > > > > I think the best way to think about this is that the *problem* was that a > > wait_event loop could spin without making any progress. So any time that > > clear forward progress is made it is safe sleep without necessitating the > > warning. Hence sched_annotate_sleep() is reasonable. > > blk_flush_plug() with definitely have dispatched some requests if it > > might_sleep(), so the sleep is OK. > > Well, yes, but you forget that this gets us back into recursion land. > io_schedule() calling io_schedule() calling io_schedule() and *boom* > stack overflow -> dead machine. > > We must either guarantee io_schedule() will never call io_schedule() or > that io_schedule() itself will not add new work to the current plug such > that calling io_schedule() itself will not recurse on the blk stuff. > > Pick either option, but pick one. I choose ... Buzz Lightyear !!! Sorry, go carried away there. Uhhmm. I think I pick a/ (But I expect I'll find a goat... ho hum). Does this look credible? Thanks, NeilBrown From: NeilBrown <neilb@xxxxxxx> Date: Fri, 13 Feb 2015 15:49:17 +1100 Subject: [PATCH] sched: prevent recursion in io_schedule() io_schedule() calls blk_flush_plug() which, depending on the contents of current->plug, can initiate arbitrary blk-io requests. Note that this contrasts with blk_schedule_flush_plug() which requires all non-trivial work to be handed off to a separate thread. This makes it possible for io_schedule() to recurse, and initiating block requests could possibly call mempool_alloc() which, in times of memory pressure, uses io_schedule(). Apart from any stack usage issues, io_schedule() will not behave correctly when called recursively as delayacct_blkio_start() does not allow for repeated calls. So: - use in_iowait to detect recursion. Set it earlier, and restore it to the old value. - move the call to "raw_rq" after the call to blk_flush_plug(). As this is some sort of per-cpu thing, we want some chance that we are on the right CPU - When io_schedule() is called recurively, use blk_schedule_flush_plug() which cannot further recurse. - as this makes io_schedule() a lot more complex and as io_schedule() must match io_schedule_timeout(), but all the changes in io_schedule_timeout() and make io_schedule a simple wrapper for that. Signed-off-by: NeilBrown <neilb@xxxxxxx> Cc: Jens Axboe <axboe@xxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1f37fe7f77a4..90f3de8bc7ca 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4420,30 +4420,27 @@ EXPORT_SYMBOL_GPL(yield_to); */ void __sched io_schedule(void) { - struct rq *rq = raw_rq(); - - delayacct_blkio_start(); - atomic_inc(&rq->nr_iowait); - blk_flush_plug(current); - current->in_iowait = 1; - schedule(); - current->in_iowait = 0; - atomic_dec(&rq->nr_iowait); - delayacct_blkio_end(); + io_schedule_timeout(MAX_SCHEDULE_TIMEOUT); } EXPORT_SYMBOL(io_schedule); long __sched io_schedule_timeout(long timeout) { - struct rq *rq = raw_rq(); + struct rq *rq; long ret; + int old_iowait = current->in_iowait; + + current->in_iowait = 1; + if (old_iowait) + blk_schedule_flush_plug(current); + else + blk_flush_plug(current); delayacct_blkio_start(); + rq = raw_rq(); atomic_inc(&rq->nr_iowait); - blk_flush_plug(current); - current->in_iowait = 1; ret = schedule_timeout(timeout); - current->in_iowait = 0; + current->in_iowait = old_iowait; atomic_dec(&rq->nr_iowait); delayacct_blkio_end(); return ret;
Attachment:
pgpn8j2rversJ.pgp
Description: OpenPGP digital signature