Re: [PATCH V3 2/3] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 28 Apr 2021 08:07:47 +0800

On Tue, Apr 27, 2021 at 01:17:06PM -0700, Bart Van Assche wrote:
> On 4/27/21 8:10 AM, Ming Lei wrote:
> > +void blk_mq_put_rq_ref(struct request *rq)
> > +{
> > +	if (is_flush_rq(rq, rq->mq_hctx))
> > +		rq->end_io(rq, 0);
> > +	else if (refcount_dec_and_test(&rq->ref))
> > +		__blk_mq_free_request(rq);
> > +}
> 
> The above function needs more work. blk_mq_put_rq_ref() may be called from
> multiple CPUs concurrently and hence must handle concurrent calls safely.
> The flush .end_io callbacks have not been designed to handle concurrent
> calls.

static void flush_end_io(struct request *flush_rq, blk_status_t error)
{
        struct request_queue *q = flush_rq->q;
        struct list_head *running;
        struct request *rq, *n;
        unsigned long flags = 0;
        struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);

        /* release the tag's ownership to the req cloned from */
        spin_lock_irqsave(&fq->mq_flush_lock, flags);

        if (!refcount_dec_and_test(&flush_rq->ref)) {
                fq->rq_status = error;
                spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
                return;
        }
		...
		spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}

Both spin lock and refcount_dec_and_test() are called at the beginning of
flush_end_io(), so it is absolutely reliable in case of concurrent
calls.

Otherwise, it is simply one issue between normal completion and timeout
since the pattern in this patch is same with timeout.

Or do I miss something?

Thanks,
Ming