On 11/17/21 8:55 AM, Jens Axboe wrote: > On 11/17/21 1:39 AM, Christoph Hellwig wrote: >> On Tue, Nov 16, 2021 at 08:38:07PM -0700, Jens Axboe wrote: >>> This enables the block layer to send us a full plug list of requests >>> that need submitting. The block layer guarantees that they all belong >>> to the same queue, but we do have to check the hardware queue mapping >>> for each request. >>> >>> If errors are encountered, leave them in the passed in list. Then the >>> block layer will handle them individually. >>> >>> This is good for about a 4% improvement in peak performance, taking us >>> from 9.6M to 10M IOPS/core. >> >> The concept looks sensible, but the loop in nvme_queue_rqs is a complete >> mess to follow. What about something like this (untested) on top? > > Let me take a closer look. Something changed, efficiency is way down: 2.26% +4.34% [nvme] [k] nvme_queue_rqs -- Jens Axboe