I see that the kernel is already is using the multi queues with the number of hardware queues is 1. But the problem IMO is that the worker seems to be processing 1 request at a time, to parallelize requests and have more concurrency more workers needs to be added. I also tried increasing the nr_hw_queues without increasing the number of workers, I did not see any difference in performance and it stayed the same. It allows to queue more requests but it is processed one at a time. I have not tried with enabling BLK_MQ_F_BLOCKING though. I see that it can schedule requests early. On Thu, Jan 23, 2020 at 10:59 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 1/21/20 12:25 PM, muraliraja.muniraju wrote: > > Current loop device implementation has a single kthread worker and > > drains one request at a time to completion. If the underneath device is > > slow then this reduces the concurrency significantly. To help in these > > cases, adding multiple loop workers increases the concurrency. Also to > > retain the old behaviour the default number of loop workers is 1 and can > > be tuned via the ioctl. > > Have you considered using blk-mq for this? Right now loop just does > some basic checks and then queues for a thread. If you bump nr_hw_queues > up (provide a parameter for that) and set BLK_MQ_F_BLOCKING in the > tag flags, then that might be a more viable approach for handling this. > > -- > Jens Axboe >