On Thu, Dec 06 2018 at 9:49pm -0500, Jens Axboe <axboe@xxxxxxxxx> wrote: > After the direct dispatch corruption fix, we permanently disallow direct > dispatch of non read/write requests. This works fine off the normal IO > path, as they will be retried like any other failed direct dispatch > request. But for the blk_insert_cloned_request() that only DM uses to > bypass the bottom level scheduler, we always first attempt direct > dispatch. For some types of requests, that's now a permanent failure, > and no amount of retrying will make that succeed. > > Use the driver private RQF_DONTPREP to track this condition in DM. If > we encounter a BUSY condition from blk_insert_cloned_request(), then > flag the request with RQF_DONTPREP. When we next time see this request, > ask blk_insert_cloned_request() to bypass insert the request directly. > This avoids the livelock of repeatedly trying to direct dispatch a > request, while still retaining the BUSY feedback loop for blk-mq so > that we don't over-dispatch to the lower level queue and mess up > opportunities for merging on the DM queue. > > Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue") > Reported-by: Bart Van Assche <bvanassche@xxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > --- > > This passes my testing as well, like the previous patch. But unlike the > previous patch, we retain the BUSY feedback loop information for better > merging. But it is kind of gross to workaround the new behaviour to "permanently disallow direct dispatch of non read/write requests" by always failing such requests back to DM for later immediate direct dispatch. That bouncing of the request was acceptable when there was load-based justification for having to retry (and in doing so: taking the cost of freeing the clone request gotten via get_request() from the underlying request_queues). Having to retry like this purely because the request isn't a read or write seems costly.. every non-read-write will have implied request_queue bouncing. In multipath's case: it could select an entirely different underlying path the next time it is destaged (with RQF_DONTPREP set). Which you'd think would negate all hope of IO merging based performance improvements -- but that is a tangent I'll need to ask Ming about (again). I really don't like this business of bouncing requests as a workaround for the recent implementation of the corruption fix. Why not just add an override flag to _really_ allow direct dispatch for _all_ types of requests? (just peeked at linux-block and it is looking like you took jianchao.wang's series to avoid this hack... ;) Awesome.. my work is done for tonight! Mike