The following patches apply to Martin's staging tree or Linus's tree. The patches main goal is to take the locks out of the main IO path but for the case of ordered cmds they also fix a handfull of bugs. For the locks we currently have: 1. lun_tg_pt_gp_lock 2. delayed_cmd_lock 3. dev_reservation_lock and this set takes out 1 and 2. With them removed a simple fio: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \ --ioengine=libaio --iodepth=64 --numjobs=$NUM_QUEUES can increase IOPs by up to 30% (from a max of 1.4M to 2M) when using multiple queues and vhost-scsi with the multiple vhost thread patches or tcm loop with nr_hw_queues set. Note: I normally hit a ceiling of 1.4M IOPs with around 8 queues but with the patches I hit a ceiling at around 16 queues and 2M IOPs. If I cheat and set emulate_pr=0 so the reservation lock is removed then it scales nicely and you can continue to add a job and queue per CPU (at least up to 20 CPUs which is when I run out of CPUs).