Please don't top post, we just lost all context here unless I had fixed it up for you. On 1/23/20 12:25 PM, Muraliraja Muniraju wrote: > > On Thu, Jan 23, 2020 at 10:59 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> On 1/21/20 12:25 PM, muraliraja.muniraju wrote: >>> Current loop device implementation has a single kthread worker and >>> drains one request at a time to completion. If the underneath device is >>> slow then this reduces the concurrency significantly. To help in these >>> cases, adding multiple loop workers increases the concurrency. Also to >>> retain the old behaviour the default number of loop workers is 1 and can >>> be tuned via the ioctl. >> >> Have you considered using blk-mq for this? Right now loop just does >> some basic checks and then queues for a thread. If you bump nr_hw_queues >> up (provide a parameter for that) and set BLK_MQ_F_BLOCKING in the >> tag flags, then that might be a more viable approach for handling this. > > I see that the kernel is already is using the multi queues with the > number of hardware queues is 1. But the problem IMO is that the worker > seems to be processing 1 request at a time, to parallelize requests > and have more concurrency more workers needs to be added. I also tried > increasing the nr_hw_queues without increasing the number of workers, > I did not see any difference in performance and it stayed the same. It > allows to queue more requests but it is processed one at a time. I > have not tried with enabling BLK_MQ_F_BLOCKING though. I see that it > can schedule requests early. The experiment is useless without BLK_MQ_F_BLOCKING set, so you need that at least. With that, you _will_ see work items processed in parallel, depending on where they are queued from. -- Jens Axboe