The following patches apply over Linus's tree or Martin's staging branch. They fix up the locking and refcount handling in the iscsi code so for software iscsi we longer need a lock when going from queuecommand to the xmit thread and no longer need a common iscsi level lock between the xmit thread and completion paths. For simple throughput workloads like fio --filename=/dev/sdb --direct=1 --rw=randwrite --bs=256k \ --ioengine=libaio --iodepth=128 --numjobs=1 --time_based \ --group_reporting --name=throughput --runtime=120 I'm able to get throughput from 24 Gb/s to 28 where I then hit a bottleneck on the target side. IOPs might increase by around 10% in some cases with: fio --filename=/dev/sdb --direct=1 --rw=randwrite --bs=4k \ --ioengine=libaio --iodepth=128 --numjobs=1 --time_based \ --group_reporting --name=throughput --runtime=120 I'm still debugging some target side issues. A bigger advantage I'm seeing with the patches is that for setups where you have software iscsi sharing CPUs with other subsystems like vhost IOPs can increase by up to 20%. Notes: - I've tested iscsi_tcp, ib_iser, be2iscsi and qedi. I don't have cxgbi or bnx2i hardware, but cxbgi changes were API only. - Lee, the first 2 patches are new bug fixes. The first half are then similar to what you saw before. I was not sure how far through them you were. The second half was the part that removed the back lock and frwd lock from iscsi_queuecommand are new.