On Tue, Nov 07, 2017 at 10:27:11AM +0800, Ming Lei wrote: > On Mon, Nov 06, 2017 at 04:34:21PM +0000, Bart Van Assche wrote: > > On Mon, 2017-11-06 at 11:44 +0800, Ming Lei wrote: > > > On Sun, Nov 05, 2017 at 03:38:49PM +0000, Bart Van Assche wrote: > > > > On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote: > > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > > > index 048be4aa6024..0b121f29e3b1 100644 > > > > > --- a/block/blk-core.c > > > > > +++ b/block/blk-core.c > > > > > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q) > > > > > queue_flag_set(QUEUE_FLAG_DEAD, q); > > > > > spin_unlock_irq(lock); > > > > > > > > > > + /* respect queue DEAD via quiesce for blk-mq */ > > > > > + if (q->mq_ops) > > > > > + blk_mq_quiesce_queue(q); > > > > > + > > > > > /* for synchronous bio-based driver finish in-flight integrity i/o */ > > > > > blk_flush_integrity(); > > > > > > > > Have you considered to change the blk_freeze_queue_start() call in > > > > blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the > > > > advantage that no new if (q->mq_ops) test has to be introduced. > > > > > > That approach isn't nothing to do with this issue, and can't fix this issue > > > too. > > > > Sorry but I disagree. My opinion is that the change I proposed is a more > > elegant way to avoid that blk_mq_run_hw_queue() tries to execute a request > > after blk_cleanup_queue() has started. > > > > > Not mention we hold q->sysfs_lock before calling blk_set_queue_dying(), > > > there may be risk to cause deadlock. > > > > Sorry but I disagree again. q->sysfs_lock is not obtained from inside any > > code that runs the queue so there is no deadlock risk. Additionally, I doubt > > that it is useful to obtain q->sysfs_lock from inside blk_cleanup_queue(). > > The queue attributes that are exported through sysfs can be modified safely > > until these sysfs attributes are removed. > > > > > The issue is that there isn't any request in queue(queue is frozen), but > > > dispatch still may happen, let me explain it a bit: > > > > > > 1) there are several IO submit paths in-progress > > > 2) requests from all these paths are inserted to queue, but may dispatch to > > > LLD in only one of these paths, but other paths may still move on to dispatch > > > even all these requests are completed(that means blk_mq_freeze_queue_wait() > > > returns at that time) > > > 3) the dispatch after queue dead happens and causes the use-after-free, > > > because we never respect queue dead for blk-mq. > > > > After a queue has been marked "dying" and after blk_freeze_queue() has > > returned it is guaranteed that all requests have finished and that all future > > blk_get_request() calls will fail. Hence .queue_rq() won't be called anymore. > > Any code that runs a queue (the __blk_mq_run_hw_queue() / blk_mq_make_request() > > callers) must hold a reference on that queue. Running a queue even after > > The reference is released via blk_queue_exit() when the request is freed in > blk_mq_free_request(), but the dispatch may still be in-progress after the request > is freed. I suggest you to take a look at the following words carefully, > and if you think something is wrong, please comment on it directly: > > 1) there are several IO submit paths in-progress > > 2) requests from all these paths are inserted to queue, but may dispatch to > LLD in only one of these paths, but other paths may still move on to dispatch > even all these requests are completed(that means blk_mq_freeze_queue_wait() > returns at that time) > > 3) the dispatch after queue dead happens and causes the use-after-free, > because we never respect queue dead for blk-mq. > > blk_freeze_queue() can make sure that no .queue_rq() can be run, but > can't make sure the other dispatch part(like dequeue, check if there is > work in queue, ...) is finished. Hi Jens, Please consider this patch for V4.14, since the in-progress dispatch after queue DEAD may cause memory corruption(some operations on ctx & lock may write to the freed memory). Thanks, Ming