On Mon, Oct 31, 2022 at 04:42:11PM -0600, Jens Axboe wrote: > On 10/31/22 4:12 PM, Al Viro wrote: > > static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq) > > { > > struct request *last = rq_list_peek(&plug->mq_list); > > > > Suppose it's not NULL... > > > > if (!plug->rq_count) { > > trace_block_plug(rq->q); > > } else if (plug->rq_count >= blk_plug_max_rq_count(plug) || > > (!blk_queue_nomerges(rq->q) && > > blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) { > > ... and we went here: > > blk_mq_flush_plug_list(plug, false); > > All requests, including the one last points to, might get fed ->queue_rq() > > here. At which point there seems to be nothing to prevent them getting > > completed and freed on another CPU, possibly before we return here. > > > > trace_block_plug(rq->q); > > } > > > > if (!plug->multiple_queues && last && last->q != rq->q) > > ... and here we dereference last. > > > > Shouldn't we reset last to NULL after the call of blk_mq_flush_plug_list() > > above? > > There's no UAF here as the requests aren't freed. We could clear 'last' > to make the code more explicit, and that would avoid any potential > suboptimal behavior with ->multiple_queues being wrong. Umm... Suppose ->has_elevator is false and so's ->multiple_queues. No ->queue_rqs(), so blk_mq_flush_plug_list() grabs rcu_read_lock() and hit blk_mq_plug_issue_direct(). blk_mq_plug_issue_direct() picks the first request off the list and passes it to blk_mq_request_issue_directly(), which passes it to __blk_mq_request_issue_directly(). There we grab a tag and proceed to __blk_mq_issue_directly(), which feeds request to ->queue_rq(). What's to stop e.g. worker on another CPU from picking that sucker, completing it and calling blk_mq_end_request() which completes all bio involved and calls blk_mq_free_request()? If all of that manages to happen before blk_mq_flush_plug_list() returns to caller... Sure, you probably won't hit in on bare metal, but if you are in a KVM and this virtual CPU happens to lose the host timeslice... I've seen considerably more narrow race windows getting hit on such setups. Am I missing something subtle here? It's been a long time since I've read through that area - as the matter of fact, I'm trying to refresh my memories of the bio_submit()-related code paths at the moment...