On 12/6/18 8:54 PM, Mike Snitzer wrote: > On Thu, Dec 06 2018 at 9:49pm -0500, > Jens Axboe <axboe@xxxxxxxxx> wrote: > >> After the direct dispatch corruption fix, we permanently disallow direct >> dispatch of non read/write requests. This works fine off the normal IO >> path, as they will be retried like any other failed direct dispatch >> request. But for the blk_insert_cloned_request() that only DM uses to >> bypass the bottom level scheduler, we always first attempt direct >> dispatch. For some types of requests, that's now a permanent failure, >> and no amount of retrying will make that succeed. >> >> Use the driver private RQF_DONTPREP to track this condition in DM. If >> we encounter a BUSY condition from blk_insert_cloned_request(), then >> flag the request with RQF_DONTPREP. When we next time see this request, >> ask blk_insert_cloned_request() to bypass insert the request directly. >> This avoids the livelock of repeatedly trying to direct dispatch a >> request, while still retaining the BUSY feedback loop for blk-mq so >> that we don't over-dispatch to the lower level queue and mess up >> opportunities for merging on the DM queue. >> >> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue") >> Reported-by: Bart Van Assche <bvanassche@xxxxxxx> >> Cc: stable@xxxxxxxxxxxxxxx >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> >> --- >> >> This passes my testing as well, like the previous patch. But unlike the >> previous patch, we retain the BUSY feedback loop information for better >> merging. > > But it is kind of gross to workaround the new behaviour to "permanently > disallow direct dispatch of non read/write requests" by always failing > such requests back to DM for later immediate direct dispatch. That > bouncing of the request was acceptable when there was load-based > justification for having to retry (and in doing so: taking the cost of > freeing the clone request gotten via get_request() from the underlying > request_queues). > > Having to retry like this purely because the request isn't a read or > write seems costly.. every non-read-write will have implied > request_queue bouncing. In multipath's case: it could select an > entirely different underlying path the next time it is destaged (with > RQF_DONTPREP set). Which you'd think would negate all hope of IO > merging based performance improvements -- but that is a tangent I'll > need to ask Ming about (again). > > I really don't like this business of bouncing requests as a workaround > for the recent implementation of the corruption fix. > > Why not just add an override flag to _really_ allow direct dispatch for > _all_ types of requests? > > (just peeked at linux-block and it is looking like you took > jianchao.wang's series to avoid this hack... ;) > > Awesome.. my work is done for tonight! The whole point is doing something that is palatable to 4.20 and leaving the more experimental stuff to 4.21, where we have some weeks to verify that there are no conditions that cause IO stalls. I don't envision there will be, but I'm not willing to risk it this late in the 4.20 cycle. That said, this isn't a quick and dirty and I don't think it's fair calling this a hack. Using RQF_DONTPREP is quite common in drivers to retain state over multiple ->queue_rq invocations. Using it to avoid multiple direct dispatch failures (and obviously this new livelock) seems fine to me. I really don't want to go around and audit every driver for potential retained state over special commands, that's why the read+write thing is in place. It's the safe option, which is what we need right now. -- Jens Axboe