On Tue, Oct 12, 2021 at 12:17:42PM -0600, Jens Axboe wrote: > Trivial to do now, just need our own io_batch on the stack and pass that > in to the usual command completion handling. > > I pondered making this dependent on how many entries we had to process, > but even for a single entry there's no discernable difference in > performance or latency. Running a sync workload over io_uring: > > t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1 > > yields the below performance before the patch: > > IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1) > IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) > IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) > > and the following after: > > IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1) > IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1) > IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) > > which definitely isn't slower, about the same if you factor in a bit of > variance. For peak performance workloads, benchmarking shows a 2% > improvement. > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > --- > drivers/nvme/host/pci.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 4713da708cd4..fb3de6f68eb1 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -1076,8 +1076,10 @@ static inline void nvme_update_cq_head(struct nvme_queue *nvmeq) > > static inline int nvme_process_cq(struct nvme_queue *nvmeq) > { > + struct io_batch ib; > int found = 0; > > + ib.req_list = NULL; Is this really more efficient than struct io_batch ib = { }; ?