Re: blk-mq request allocation stalls

Jens Axboe <axboe@xxxxxxxxx> · Mon, 12 Jan 2015 12:05:10 -0700

On 01/12/2015 11:22 AM, Keith Busch wrote:
On Mon, 12 Jan 2015, Jens Axboe wrote:
On 01/12/2015 10:53 AM, Keith Busch wrote:
Is the nr_active count correct prior to starting the mkfs test? Trying
to see if someone is calling "blk_mq_alloc_tag_set()" twice on the same
set. It might be good to add a WARN if this is detected anyway.

That might be a good debug aid, I agree. But the above doesn't look
like it's corrupted. If you add the values, you get 60 and 62 for the
two cases, which seems to indicate that we did bump the values
correctly, but for some reason we never did the decrement on
completion. Hence we stabilize around the queue depth of the device,
which will be 62 +/- a bit due to the sharing.

I'm not familiar with how rq based dm works. We clone the original
request (which has the RQ_MQ_INFLIGHT flag set), then we issue the
clone(s) to the underlying device(s)? And when that completes, we
complete the original? That would work fine with the flag on the
original request. Maybe I'm missing something, and I'll let more
knowledgeable people discuss that.

Oh, let's look at "__blk_rq_prep_clone". dm calls that after
blk_get_request() for the blk-mq based multipath types and overrides the
destinations cmd_flags with the source's even though the source was not
allocated from a blk-mq based queue, much less a shared tag.

Heh, I suck, I had read that but read it as |=. So yes, that would seem 
to backup my missing flag theory.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel