I think you missed:
http://git.infradead.org/nvme.git/commit/49c91e3e09dc3c9dd1718df85112a8cce3ab7007
I indeed did, thanks.
But it doesn't help.
We're still having to wait for the first interrupt, and if we're really
fast that's the only completion we have to process.
Try this:
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b4b32e6..e2dd9e2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -623,6 +623,8 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
}
__nvme_submit_cmd(nvmeq, &cmnd);
spin_unlock(&nvmeq->sq_lock);
+ disable_irq_nosync(nvmeq_irq(irq));
+ irq_poll_sched(&nvmeq->iop);
a. This would trigger a condition that we disable irq twice which
is wrong at least because it will generate a warning.
b. This would cause a way-too-much triggers of ksoftirqd. In order for
it to be effective we need to to run only when it should and optimally
when the completion queue has a batch of completions waiting.
After a deeper analysis, I agree with Bart that interrupt coalescing is
needed for it to work. The problem with nvme coalescing as Jens said, is
a death penalty of 100us granularity. Hannes, Johannes, how does it look
like with the devices you are testing with?
Also, I think that adaptive moderation is needed in order for it to
work well. I know that some networking drivers implemented adaptive
moderation in SW before having HW support for it. It can be done by
maintaining stats and having a periodic work that looks at it and
changes the moderation parameters.
Does anyone think that this is something we should consider?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html