Re: blk-mq request allocation stalls [was: Re: [PATCH v3 0/8] dm: add request-based blk-mq support]

Jens Axboe <axboe@xxxxxxxxx> · Wed, 07 Jan 2015 10:35:30 -0700

On 01/07/2015 10:18 AM, Mike Snitzer wrote:
On Wed, Jan 07 2015 at 11:24am -0500,
Jens Axboe <axboe@xxxxxxxxx> wrote:

On 01/07/2015 09:22 AM, Mike Snitzer wrote:
On Wed, Jan 07 2015 at 11:15am -0500,
Mike Snitzer <snitzer@xxxxxxxxxx> wrote:

On Wed, Jan 07 2015 at 10:32am -0500,
Jens Axboe <axboe@xxxxxxxxx> wrote:

You forgot dm-1, that's what mkfs is sleeping on. But given that
sdc/sdd look idle, it still looks like a case of dm-1 not
appropriately running the queues after insertion.

DM never directly runs the queues of the underlying SCSI devices
(e.g. sdc, sdd).

request-based DM runs the DM device's queue on completion of a clone
request:

dm_end_request -> rq_completed -> blk_run_queue_async

Which ultimately does seem to be the wrong way around (like you say:
queue should run after insertion).

Hmm, for q->mq_ops blk_insert_cloned_request() should already be running
the queue.

blk_insert_cloned_request is calling blk_mq_insert_request(rq, false, true, true);

Third arg being @run_queue which results in blk_mq_run_hw_queue() being
called.

OK, that should be fine then. In that case, it's probably a missing
queue run in some other condition... Which does make more sense,
since "most" of the runs Bart did looked fine, it's just a slow one
every now and then.

The one case that looks suspect (missing queue run) is if/when the
multipath target's mapping function returns DM_MAPIO_REQUEUE.  It only
returns this if memory allocation failed (e.g. blk_get_request returns
ENOMEM).  Not seeing why DM core wouldn't want to re-run the DM device's
queue in this case given it just called blk_requeue_request().  But that
seems like something I need to revisit and not the ultimate problem.

Looking closer, why not have blk_run_queue() (and blk_run_queue_async)
call blk_mq_start_stopped_hw_queues() if q->mq_ops?  As is
scsi_run_queue() open-codes it.

Actually, that is likely the ultimate problem: blk_run_queue* aren't
trained for q->mq_ops.  As such DM would need to open code a call to
blk_mq_start_stopped_hw_queues.

It's not completely parallel how SCSI uses it. blk_run_queue(), for 
legacy request_fn, does not start stopped queues. For the mq path, 
scsi-mq decided to do that. So if we embed 
blk_mq_start_stopped_hw_queues() in blk_run_queue*(), then we'd have 
different behavior between mq and non-mq. We could have 
blk_start_queue() do the right thing, but that would require different 
contexts between mq and non-mq, as non-mq requires that to be called 
with the queue lock held and ints disabled.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel