The following patches made over Martin's staging branch fix some ref counting issues I hit while testing and improves the locking in the IO paths. To do the latter, the patches: 1. move the sess_cmd_lock to tcm_qla2xxx since it was the only driver using the sess_cmd_list. 2. makes the execution lock/list per cpu'ish. I just allocate nr_cpu_ids's worth of lock/lists then make sure we complete the cmd on the cpu it was started on. With the patches I'm seeing a 25% improvement in IOPs for small IO tests like: fio --filename=/dev/sdXYZ --direct=1 --rw=randrw --bs=4k \ --iodepth=128 --numjobs=16 with drivers like vhost (with those other patches on the list to fix up multiple virtqueue support) and with the included loop patch when nr hw queues is increased. I've dropped the RFC, because Himanshu got my hw working and I've test the qlogic patches, so please review.