Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 8 Nov 2017 10:10:07 +0800

On Tue, Nov 07, 2017 at 10:27:11AM +0800, Ming Lei wrote:
> On Mon, Nov 06, 2017 at 04:34:21PM +0000, Bart Van Assche wrote:
> > On Mon, 2017-11-06 at 11:44 +0800, Ming Lei wrote:
> > > On Sun, Nov 05, 2017 at 03:38:49PM +0000, Bart Van Assche wrote:
> > > > On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote:
> > > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > > index 048be4aa6024..0b121f29e3b1 100644
> > > > > --- a/block/blk-core.c
> > > > > +++ b/block/blk-core.c
> > > > > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > >  	spin_unlock_irq(lock);
> > > > >  
> > > > > +	/* respect queue DEAD via quiesce for blk-mq */
> > > > > +	if (q->mq_ops)
> > > > > +		blk_mq_quiesce_queue(q);
> > > > > +
> > > > >  	/* for synchronous bio-based driver finish in-flight integrity i/o */
> > > > >  	blk_flush_integrity();
> > > > 
> > > > Have you considered to change the blk_freeze_queue_start() call in
> > > > blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the
> > > > advantage that no new if (q->mq_ops) test has to be introduced.
> > > 
> > > That approach isn't nothing to do with this issue, and can't fix this issue
> > > too.
> > 
> > Sorry but I disagree. My opinion is that the change I proposed is a more
> > elegant way to avoid that blk_mq_run_hw_queue() tries to execute a request
> > after blk_cleanup_queue() has started.
> > 
> > > Not mention we hold q->sysfs_lock before calling blk_set_queue_dying(),
> > > there may be risk to cause deadlock.
> > 
> > Sorry but I disagree again. q->sysfs_lock is not obtained from inside any
> > code that runs the queue so there is no deadlock risk. Additionally, I doubt
> > that it is useful to obtain q->sysfs_lock from inside blk_cleanup_queue().
> > The queue attributes that are exported through sysfs can be modified safely
> > until these sysfs attributes are removed.
> > 
> > > The issue is that there isn't any request in queue(queue is frozen), but
> > > dispatch still may happen, let me explain it a bit:
> > > 
> > > 1) there are several IO submit paths in-progress
> > > 2) requests from all these paths are inserted to queue, but may dispatch to
> > > LLD in only one of these paths, but other paths may still move on to dispatch
> > > even all these requests are completed(that means blk_mq_freeze_queue_wait()
> > > returns at that time)
> > > 3) the dispatch after queue dead happens and causes the use-after-free,
> > > because we never respect queue dead for blk-mq.
> > 
> > After a queue has been marked "dying" and after blk_freeze_queue() has
> > returned it is guaranteed that all requests have finished and that all future
> > blk_get_request() calls will fail. Hence .queue_rq() won't be called anymore.
> > Any code that runs a queue (the __blk_mq_run_hw_queue() / blk_mq_make_request()
> > callers) must hold a reference on that queue. Running a queue even after
> 
> The reference is released via blk_queue_exit() when the request is freed in
> blk_mq_free_request(), but the dispatch may still be in-progress after the request
> is freed. I suggest you to take a look at the following words carefully,
> and if you think something is wrong, please comment on it directly:
> 
> 	1) there are several IO submit paths in-progress
> 
> 	2) requests from all these paths are inserted to queue, but may dispatch to
> 	LLD in only one of these paths, but other paths may still move on to dispatch
> 	even all these requests are completed(that means blk_mq_freeze_queue_wait()
> 	returns at that time)
> 
> 	3) the dispatch after queue dead happens and causes the use-after-free,
> 	because we never respect queue dead for blk-mq.
> 
> blk_freeze_queue() can make sure that no .queue_rq() can be run, but
> can't make sure the other dispatch part(like dequeue, check if there is
> work in queue, ...) is finished.

Hi Jens,

Please consider this patch for V4.14, since the in-progress dispatch
after queue DEAD may cause memory corruption(some operations on ctx & lock
may write to the freed memory).

Thanks,
Ming