On 11/17/21 1:39 AM, Christoph Hellwig wrote: > On Tue, Nov 16, 2021 at 08:38:07PM -0700, Jens Axboe wrote: >> This enables the block layer to send us a full plug list of requests >> that need submitting. The block layer guarantees that they all belong >> to the same queue, but we do have to check the hardware queue mapping >> for each request. >> >> If errors are encountered, leave them in the passed in list. Then the >> block layer will handle them individually. >> >> This is good for about a 4% improvement in peak performance, taking us >> from 9.6M to 10M IOPS/core. > > The concept looks sensible, but the loop in nvme_queue_rqs is a complete > mess to follow. What about something like this (untested) on top? Let me take a closer look. > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 13722cc400c2c..555a7609580c7 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -509,21 +509,6 @@ static inline void nvme_copy_cmd(struct nvme_queue *nvmeq, > nvmeq->sq_tail = 0; > } > > -/** > - * nvme_submit_cmd() - Copy a command into a queue and ring the doorbell > - * @nvmeq: The queue to use > - * @cmd: The command to send > - * @write_sq: whether to write to the SQ doorbell > - */ > -static void nvme_submit_cmd(struct nvme_queue *nvmeq, struct nvme_command *cmd, > - bool write_sq) > -{ > - spin_lock(&nvmeq->sq_lock); > - nvme_copy_cmd(nvmeq, cmd); > - nvme_write_sq_db(nvmeq, write_sq); > - spin_unlock(&nvmeq->sq_lock); > -} You really don't like helpers? Code generation wise it doesn't matter, but without this and the copy helper we do end up having some trivial duplicated code... -- Jens Axboe