Re: [PATCH v2] block/dm: fix handling of busy off direct dispatch path

Mike Snitzer <snitzer@xxxxxxxxxx> · Thu, 6 Dec 2018 23:18:43 -0500

On Thu, Dec 06 2018 at 11:06pm -0500,
Jens Axboe <axboe@xxxxxxxxx> wrote:

> On 12/6/18 8:54 PM, Mike Snitzer wrote:
> > On Thu, Dec 06 2018 at  9:49pm -0500,
> > Jens Axboe <axboe@xxxxxxxxx> wrote:
> > 
> >> After the direct dispatch corruption fix, we permanently disallow direct
> >> dispatch of non read/write requests. This works fine off the normal IO
> >> path, as they will be retried like any other failed direct dispatch
> >> request. But for the blk_insert_cloned_request() that only DM uses to
> >> bypass the bottom level scheduler, we always first attempt direct
> >> dispatch. For some types of requests, that's now a permanent failure,
> >> and no amount of retrying will make that succeed.
> >>
> >> Use the driver private RQF_DONTPREP to track this condition in DM. If
> >> we encounter a BUSY condition from blk_insert_cloned_request(), then
> >> flag the request with RQF_DONTPREP. When we next time see this request,
> >> ask blk_insert_cloned_request() to bypass insert the request directly.
> >> This avoids the livelock of repeatedly trying to direct dispatch a
> >> request, while still retaining the BUSY feedback loop for blk-mq so
> >> that we don't over-dispatch to the lower level queue and mess up
> >> opportunities for merging on the DM queue.
> >>
> >> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
> >> Reported-by: Bart Van Assche <bvanassche@xxxxxxx>
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> >>
> >> ---
> >>
> >> This passes my testing as well, like the previous patch. But unlike the
> >> previous patch, we retain the BUSY feedback loop information for better
> >> merging.
> > 
> > But it is kind of gross to workaround the new behaviour to "permanently
> > disallow direct dispatch of non read/write requests" by always failing
> > such requests back to DM for later immediate direct dispatch.  That
> > bouncing of the request was acceptable when there was load-based
> > justification for having to retry (and in doing so: taking the cost of
> > freeing the clone request gotten via get_request() from the underlying
> > request_queues).
> > 
> > Having to retry like this purely because the request isn't a read or
> > write seems costly.. every non-read-write will have implied
> > request_queue bouncing.  In multipath's case: it could select an
> > entirely different underlying path the next time it is destaged (with
> > RQF_DONTPREP set).  Which you'd think would negate all hope of IO
> > merging based performance improvements -- but that is a tangent I'll
> > need to ask Ming about (again).
> > 
> > I really don't like this business of bouncing requests as a workaround
> > for the recent implementation of the corruption fix.
> > 
> > Why not just add an override flag to _really_ allow direct dispatch for
> > _all_ types of requests?
> > 
> > (just peeked at linux-block and it is looking like you took 
> > jianchao.wang's series to avoid this hack... ;)
> > 
> > Awesome.. my work is done for tonight!
> 
> The whole point is doing something that is palatable to 4.20 and leaving
> the more experimental stuff to 4.21, where we have some weeks to verify
> that there are no conditions that cause IO stalls. I don't envision there
> will be, but I'm not willing to risk it this late in the 4.20 cycle.
> 
> That said, this isn't a quick and dirty and I don't think it's fair
> calling this a hack. Using RQF_DONTPREP is quite common in drivers to
> retain state over multiple ->queue_rq invocations. Using it to avoid
> multiple direct dispatch failures (and obviously this new livelock)
> seems fine to me.

But it bounces IO purely because non-read-write.  That results in
guaranteed multiple blk_get_request() -- from underlying request_queues
request-based DM is stacked on -- for every non-read-write IO that is
cloned.  That seems pathological.  I must still be missing something.

> I really don't want to go around and audit every driver for potential
> retained state over special commands, that's why the read+write thing is
> in place. It's the safe option, which is what we need right now.

Maybe leave blk_mq_request_issue_directly() interface how it is,
non-read-write restriction and all, but export a new
__blk_mq_request_issue_directly() that _only_
blk_insert_cloned_request() -- and future comparable users -- makes use
of?

To me that is the best of both worlds: Fix corruption issue but don't
impose needless blk_get_request() dances for non-read-write IO issued to
dm-multipath.

Mike