Re: [PATCH 7/8] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter

Ming Lei <ming.lei@xxxxxxxxxx> · Mon, 26 Apr 2021 08:41:52 +0800

On Sun, Apr 25, 2021 at 11:55:22AM -0700, Bart Van Assche wrote:
> On 4/25/21 1:57 AM, Ming Lei wrote:
> > However, still one request UAF not covered: refcount_inc_not_zero() may
> > read one freed request, and it will be handled in next patch.
> 
> This means that patch "blk-mq: clear stale request in tags->rq[] before
> freeing one request pool" should come before this patch.

It doesn't matter. This patch only can't avoid the UAF too, we need
to grab req->ref to prevent queue from being frozen.

> 
> > @@ -276,12 +277,15 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> >  		rq = tags->static_rqs[bitnr];
> >  	else
> >  		rq = tags->rqs[bitnr];
> > -	if (!rq)
> > +	if (!rq || !refcount_inc_not_zero(&rq->ref))
> >  		return true;
> >  	if ((iter_data->flags & BT_TAG_ITER_STARTED) &&
> >  	    !blk_mq_request_started(rq))
> > -		return true;
> > -	return iter_data->fn(rq, iter_data->data, reserved);
> > +		ret = true;
> > +	else
> > +		ret = iter_data->fn(rq, iter_data->data, reserved);
> > +	blk_mq_put_rq_ref(rq);
> > +	return ret;
> >  }
> 
> Even if patches 7/8 and 8/8 would be reordered, the above code
> introduces a new use-after-free, a use-after-free that is much worse
> than the UAF in kernel v5.11. The following sequence can be triggered by
> the above code:
> * bt_tags_iter() reads tags->rqs[bitnr] and stores the request pointer
> in the 'rq' variable.
> * Request 'rq' completes, tags->rqs[bitnr] is cleared and the memory
> that backs that request is freed.
> * The memory that backs 'rq' is used for another purpose and the request
> reference count becomes nonzero.

That means the 'rq' is re-allocated, and it becomes in-flight again.

> * bt_tags_iter() increments the request reference count and thereby
> corrupts memory.

No, When refcount_inc_not_zero() succeeds in bt_tags_iter(), no one can
free the request any more until ->fn() returns, why do you think memory
corrupts? This pattern isn't different with timeout's usage, is it?

If IO activity is allowed during iterating tagset requests, ->fn() and
in-flight IO can always be run concurrently. That is caller's
responsibility to handle the race. That is why you can see lots callers
do quiesce queues before calling blk_mq_tagset_busy_iter(), but
quiesce isn't required if ->fn() just READs request only.

Your patch or current in-tree code has same 'problem' too, if you think
it is a problem. Clearing ->rq[tag] or holding a lock before calling
->fn() can not avoid such thing, can it?

Finally it is a request walking in tagset wide, so it should be safe for
->fn to iterate over request in this way. The thing is just that req->tag may
become not same with 'bitnr' any more. We can handle it simply by checking
if 'req->tag == bitnr' in bt_tags_iter() after the req->ref is grabbed,
still not sure if it is absolutely necessary.

Thanks,
Ming