The following patches made over Martin's 5.12 branches fix two issues: 1. target_core_iblock plugs and unplugs the queue for every command. To handle this issue and handle an issue that vhost-scsi and loop were avoiding by adding their own workqueue, I added a new submission workqueue to LIO. Drivers can pass cmds to it, and we can then submit batches of cmds. 2. vhost-scsi and loop on the submission side were doing a work per cmd and on the lio completion side it was doing a work per cmd. The cap on running works is 512 (max_active) and so we can end up end up using a lot of threads when submissions start blocking because they hit the block tag limit or the completion side blocks trying to send the cmd. In this patchset I just use a cmd list per session to avoid abusing the workueue layer. The combined patchset fixes a major perf issue we've been hitting where IOPs is stuck at 230K when running: fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --numjobs=8 --time_based --group_reporting --runtime=60 The patches in this set get me to 350K when using devices that have native IOPs of around 400-500K. Note that 5.12 has some interrupt changes that my patches collide with. Martin's 5.12 branches had the changes so I based my patches on that. V2: - Fix up container_of use coding style - Handle offlist review comment from Laurence where with the original code and my patches we can hit a bug where the cmd times out, LIO starts up the TMR code, but it misses the cmd because it's on the workqueue. - Made the work per device work instead of session to handle the previous issue and so if one dev hits some issue it sleeps on, it won't block other devices.