The following patches were made over Martin's 5.12 branches to handle conflicts with the in_interrupt changes. The patches fix the following issues: 1. target_core_iblock plugs and unplugs the queue for every command. To handle this issue and handle an issue that vhost-scsi and loop were avoiding by adding their own workqueue, I added a new submission workqueue to LIO. Drivers can pass cmds to it, and we can then submit batches of cmds. 2. vhost-scsi and loop on the submission side were doing a work per cmd but because we can block in the block layer on resources like tags we can end up creating lots of threads that will fight each other. In this patchset I just use a cmd list per device to avoid abusing the workueue layer and to better batch the cmds to the lower layers. The combined patchset fixes a major perf issue we've been hitting with vhost-scsi where IOPs were stuck at 230K when running: fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --numjobs=8 --time_based --group_reporting --runtime=60 The patches in this set get me to 350K when using devices that have native IOPs of around 400-500K. 3. Fix target_submit* error handling. While handling Christoph's comment to kill target_submit_cmd_map_sgls I hit several bugs that are now also fixed up. V6: - Fix gfp function arg comment. - Drop part of git commit message about xcopy using GFP_KERNEL. - Allow user to pass in specific CPU to do completions on. V5: - Add WARN if driver has used simple API and target_stop_session - Added fix for simple API users (usb, sbp, and xen) for early failure (invalid LUN, write on a RO dev, etc) handling. V4: - Fixed the target_submit error handling. - Dropped get_cdb callback. - Fixed kernel robot errors for incorrect return values and unused variables. - Used flush instead of cancel to fix bug in tmr code. - Fixed race in tcmu. - Made completion affinity handling a configfs setting - Dropped patch that added the per device work lists. It really helped a lot for higher perf initiators and tcm loop but only gave around a 5% boost to other drivers. So I dropped it for now to see if there is something more generic we can do. V3: - Fix rc type in target_submit so its a sense_reason_t - Add BUG_ON if caller uses target_queue_cmd_submit but hasn't implemented get_cdb. - Drop unused variables in loop. - Fix race in tcmu plug check - Add comment about how plug check works in iblock - Do a flush when handling TMRs instead of cancel V2: - Fix up container_of use coding style - Handle offlist review comment from Laurence where with the original code and my patches we can hit a bug where the cmd times out, LIO starts up the TMR code, but it misses the cmd because it's on the workqueue. - Made the work per device work instead of session to handle the previous issue and so if one dev hits some issue it sleeps on, it won't block other devices.