On 12/21/17 1:49 PM, Jens Axboe wrote: > On 12/21/17 1:46 PM, Keith Busch wrote: >> This is a performance optimization that allows the hardware to work on >> a command in parallel with the kernel's stats and timeout tracking. >> >> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> >> --- >> drivers/nvme/host/pci.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c >> index f5800c3c9082..df5550ce0531 100644 >> --- a/drivers/nvme/host/pci.c >> +++ b/drivers/nvme/host/pci.c >> @@ -886,8 +886,6 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx, >> goto out_cleanup_iod; >> } >> >> - blk_mq_start_request(req); >> - >> spin_lock_irq(&nvmeq->q_lock); >> if (unlikely(nvmeq->cq_vector < 0)) { >> ret = BLK_STS_IOERR; >> @@ -895,6 +893,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx, >> goto out_cleanup_iod; >> } >> __nvme_submit_cmd(nvmeq, &cmnd); >> + blk_mq_start_request(req); >> nvme_process_cq(nvmeq); >> spin_unlock_irq(&nvmeq->q_lock); >> return BLK_STS_OK; > > I guess this will work since we hold the q_lock, but you should probably > include a comment to that effect since it's important that we can't find > the completion before we've called started. > > Actually, you'd need to reorder this after #2 (I'm assuming, it hasn't > shown up yet, but I'm guessing it's kill the cq check after submission) > since if we have multiple CPUs on the same hardware queue, one of > the other CPUs could find a completion event between your > __nvme_submit_cmd() and blk_mq_start_request() call. Very short window, > but it does exist. Turns out that wasn't what patch 2 was. And the code is right there above as well, and under the q_lock, so I guess that race doesn't exist. But that does bring up the fact if we should always be doing the nvme_process_cq(nvmeq) after IO submission. For direct/hipri IO, maybe it's better to make the submission path faster and skip it? -- Jens Axboe