Re: block: fix direct dispatch issue failure for clones

Mike Snitzer <snitzer@xxxxxxxxxx> · Thu, 6 Dec 2018 20:58:30 -0500

On Thu, Dec 06 2018 at  8:34pm -0500,
Jens Axboe <axboe@xxxxxxxxx> wrote:

> On 12/6/18 6:22 PM, jianchao.wang wrote:
> > 
> > 
> > On 12/7/18 9:13 AM, Jens Axboe wrote:
> >> On 12/6/18 6:04 PM, jianchao.wang wrote:
> >>>
> >>>
> >>> On 12/7/18 6:20 AM, Jens Axboe wrote:
> >>>> After the direct dispatch corruption fix, we permanently disallow direct
> >>>> dispatch of non read/write requests. This works fine off the normal IO
> >>>> path, as they will be retried like any other failed direct dispatch
> >>>> request. But for the blk_insert_cloned_request() that only DM uses to
> >>>> bypass the bottom level scheduler, we always first attempt direct
> >>>> dispatch. For some types of requests, that's now a permanent failure,
> >>>> and no amount of retrying will make that succeed.
> >>>>
> >>>> Don't use direct dispatch off the cloned insert path, always just use
> >>>> bypass inserts. This still bypasses the bottom level scheduler, which is
> >>>> what DM wants.
> >>>>
> >>>> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
> >>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> >>>>
> >>>> ---
> >>>>
> >>>> diff --git a/block/blk-core.c b/block/blk-core.c
> >>>> index deb56932f8c4..4c44e6fa0d08 100644
> >>>> --- a/block/blk-core.c
> >>>> +++ b/block/blk-core.c
> >>>> @@ -2637,7 +2637,8 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
> >>>>  		 * bypass a potential scheduler on the bottom device for
> >>>>  		 * insert.
> >>>>  		 */
> >>>> -		return blk_mq_request_issue_directly(rq);
> >>>> +		blk_mq_request_bypass_insert(rq, true);
> >>>> +		return BLK_STS_OK;
> >>>>  	}
> >>>>  
> >>>>  	spin_lock_irqsave(q->queue_lock, flags);
> >>>>
> >>> Not sure about this because it will break the merging promotion for request based DM
> >>> from Ming.
> >>> 396eaf21ee17c476e8f66249fb1f4a39003d0ab4
> >>> (blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback)
> >>>
> >>> We could use some other way to fix this.
> >>
> >> That really shouldn't matter as this is the cloned insert, merging should
> >> have been done on the original request.
> >>
> >>
> > Just quote some comments from the patch.
> > 
> > "
> >                        But dm-rq currently can't get the underlying queue's
> >     dispatch feedback at all.  Without knowing whether a request was issued
> >     or not (e.g. due to underlying queue being busy) the dm-rq elevator will
> >     not be able to provide effective IO merging (as a side-effect of dm-rq
> >     currently blindly destaging a request from its elevator only to requeue
> >     it after a delay, which kills any opportunity for merging).  This
> >     obviously causes very bad sequential IO performance.
> >     ...
> >     With this, request-based DM's blk-mq sequential IO performance is vastly
> >     improved (as much as 3X in mpath/virtio-scsi testing)
> > "
> > 
> > Using blk_mq_request_bypass_insert to replace the blk_mq_request_issue_directly
> > could be a fast method to fix the current issue. Maybe we could get the merging
> > promotion back after some time.
> 
> This really sucks, mostly because DM wants to have it both ways - not use
> the bottom level IO scheduler, but still actually use it if it makes sense.

Well no, that isn't what DM is doing.  DM does have an upper layer
scheduler that would like to be afforded the same capabilities that any
request-based driver is given.  Yes that comes with plumbing in safe
passage for upper layer requests dispatched from a stacked blk-mq IO
scheduler.

> There is another way to fix this - still do the direct dispatch, but have
> dm track if it failed and do bypass insert in that case. I didn't want do
> to that since it's more involved, but it's doable.
> 
> Let me cook that up and test it... Don't like it, though.

Not following how DM can track if issuing the request worked if it is
always told it worked with BLK_STS_OK.  We care about feedback when the
request is actually issued because of the elaborate way blk-mq elevators
work.  DM is forced to worry about all these details, as covered some in
the header for commit 396eaf21ee17c476e8f66249fb1f4a39003d0ab4, it is
trying to have its cake and eat it too.  It just wants IO scheduling to
work for request-based DM devices.  That's it.

Mike