Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

Bart Van Assche <Bart.VanAssche@xxxxxxx> · Mon, 4 Sep 2017 15:40:35 +0000

On Mon, 2017-09-04 at 15:16 +0800, Ming Lei wrote:
> On Mon, Sep 04, 2017 at 04:13:26AM +0000, Bart Van Assche wrote:
> > Allowing blk_get_request() to succeed after the DYING flag has been set is
> > completely wrong because that could result in a request being queued after
> > the DEAD flag has been set, resulting in either a hanging request or a kernel
> > crash. This is why it's completely wrong to add a blk_queue_enter_live() call
> > in blk_old_get_request() or blk_mq_alloc_request(). Hence my NAK for any
> > patch that adds a blk_queue_enter_live() call to any function called from
> > blk_get_request(). That includes the patch at the start of this e-mail thread.
>
> See above, this patch changes nothing about this fact, please look at
> the patch carefully next time just before posting your long comment.

Are you really sure that your patch does not allow blk_get_request() to
succeed after the DYING flag has been set? blk_mq_alloc_request() calls both
blk_queue_is_preempt_frozen() and blk_queue_enter_live() without holding
any lock. A thread that is running concurrently with blk_mq_get_request()
can unfreeze the queue after blk_queue_is_preempt_frozen() returned and
before blk_queue_enter_live() is called. This means that with your patch
series applied blk_get_request() can succeed after the DYING flag has been
set, which is something we don't want. Additionally, I don't think we want
to introduce any kind of locking in blk_mq_get_request() because that would
be a serialization point.

Have you considered to use the blk-mq "reserved request" mechanism to avoid
starvation of power management requests instead of making the block layer
even more complicated than it already is?

Note: extending blk_mq_freeze/unfreeze_queue() to the legacy block layer
could be useful to make scsi_wait_for_queuecommand() more elegant. However,
I don't think we should spend our time on legacy block layer / SCSI core
changes. The code I'm referring to is the following:

/**
 * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
 * @sdev: SCSI device pointer.
 *
 * Wait until the ongoing shost->hostt->queuecommand() calls that are
 * invoked from scsi_request_fn() have finished.
 */
static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
{
	WARN_ON_ONCE(sdev->host->use_blk_mq);

	while (scsi_request_fn_active(sdev))
		msleep(20);
}

Bart.