On Wed, Aug 09, 2017 at 10:32:52AM +0800, Ming Lei wrote: > On Wed, Aug 9, 2017 at 8:11 AM, Omar Sandoval <osandov@xxxxxxxxxxx> wrote: > > On Sat, Aug 05, 2017 at 02:56:46PM +0800, Ming Lei wrote: > >> When hw queue is busy, we shouldn't take requests from > >> scheduler queue any more, otherwise IO merge will be > >> difficult to do. > >> > >> This patch fixes the awful IO performance on some > >> SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber > >> is used by not taking requests if hw queue is busy. > > > > Jens added this behavior in 64765a75ef25 ("blk-mq-sched: ask scheduler > > for work, if we failed dispatching leftovers"). That change was a big > > performance improvement, but we didn't figure out why. We'll need to dig > > up whatever test Jens was doing to make sure it doesn't regress. > > Not found info about Jen's test case on this commit from google. > > Maybe Jens could provide some input about your test case? Okay I found my previous discussion with Jens (it was an off-list discussion). The test case was xfs/297 from xfstests: after 64765a75ef25, the test went from taking ~300 seconds to ~200 seconds on his SCSI device. > In theory, if hw queue is busy and requests are left in ->dispatch, > we should not have continued to dequeue requests from sw/scheduler queue > any more. Otherwise, IO merge can be hurt much. At least on SCSI devices, > this improved much on sequential I/O, at least 3X of sequential > read is increased on lpfc with this patch, in case of mq-deadline. Right, your patch definitely makes more sense intuitively. > Or are there other special cases in which we still need > to push requests hard into a busy hardware? xfs/297 does a lot of fsyncs and hence a lot of flushes, that could be the special case. > And this patch won't have an effect on devices in which queue busy > is seldom triggered, such as NVMe.