On Thu, Dec 06 2018 at 11:06pm -0500, Jens Axboe <axboe@xxxxxxxxx> wrote: > On 12/6/18 8:54 PM, Mike Snitzer wrote: > > On Thu, Dec 06 2018 at 9:49pm -0500, > > Jens Axboe <axboe@xxxxxxxxx> wrote: > > > >> After the direct dispatch corruption fix, we permanently disallow direct > >> dispatch of non read/write requests. This works fine off the normal IO > >> path, as they will be retried like any other failed direct dispatch > >> request. But for the blk_insert_cloned_request() that only DM uses to > >> bypass the bottom level scheduler, we always first attempt direct > >> dispatch. For some types of requests, that's now a permanent failure, > >> and no amount of retrying will make that succeed. > >> > >> Use the driver private RQF_DONTPREP to track this condition in DM. If > >> we encounter a BUSY condition from blk_insert_cloned_request(), then > >> flag the request with RQF_DONTPREP. When we next time see this request, > >> ask blk_insert_cloned_request() to bypass insert the request directly. > >> This avoids the livelock of repeatedly trying to direct dispatch a > >> request, while still retaining the BUSY feedback loop for blk-mq so > >> that we don't over-dispatch to the lower level queue and mess up > >> opportunities for merging on the DM queue. > >> > >> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue") > >> Reported-by: Bart Van Assche <bvanassche@xxxxxxx> > >> Cc: stable@xxxxxxxxxxxxxxx > >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > >> > >> --- > >> > >> This passes my testing as well, like the previous patch. But unlike the > >> previous patch, we retain the BUSY feedback loop information for better > >> merging. > > > > But it is kind of gross to workaround the new behaviour to "permanently > > disallow direct dispatch of non read/write requests" by always failing > > such requests back to DM for later immediate direct dispatch. That > > bouncing of the request was acceptable when there was load-based > > justification for having to retry (and in doing so: taking the cost of > > freeing the clone request gotten via get_request() from the underlying > > request_queues). > > > > Having to retry like this purely because the request isn't a read or > > write seems costly.. every non-read-write will have implied > > request_queue bouncing. In multipath's case: it could select an > > entirely different underlying path the next time it is destaged (with > > RQF_DONTPREP set). Which you'd think would negate all hope of IO > > merging based performance improvements -- but that is a tangent I'll > > need to ask Ming about (again). > > > > I really don't like this business of bouncing requests as a workaround > > for the recent implementation of the corruption fix. > > > > Why not just add an override flag to _really_ allow direct dispatch for > > _all_ types of requests? > > > > (just peeked at linux-block and it is looking like you took > > jianchao.wang's series to avoid this hack... ;) > > > > Awesome.. my work is done for tonight! > > The whole point is doing something that is palatable to 4.20 and leaving > the more experimental stuff to 4.21, where we have some weeks to verify > that there are no conditions that cause IO stalls. I don't envision there > will be, but I'm not willing to risk it this late in the 4.20 cycle. > > That said, this isn't a quick and dirty and I don't think it's fair > calling this a hack. Using RQF_DONTPREP is quite common in drivers to > retain state over multiple ->queue_rq invocations. Using it to avoid > multiple direct dispatch failures (and obviously this new livelock) > seems fine to me. But it bounces IO purely because non-read-write. That results in guaranteed multiple blk_get_request() -- from underlying request_queues request-based DM is stacked on -- for every non-read-write IO that is cloned. That seems pathological. I must still be missing something. > I really don't want to go around and audit every driver for potential > retained state over special commands, that's why the read+write thing is > in place. It's the safe option, which is what we need right now. Maybe leave blk_mq_request_issue_directly() interface how it is, non-read-write restriction and all, but export a new __blk_mq_request_issue_directly() that _only_ blk_insert_cloned_request() -- and future comparable users -- makes use of? To me that is the best of both worlds: Fix corruption issue but don't impose needless blk_get_request() dances for non-read-write IO issued to dm-multipath. Mike