On 10/7/2014 4:07 PM, Bart Van Assche wrote:
Improve performance by using multiple RDMA/RC channels per SCSI host for communication with an SRP target. About the implementation: - Introduce a loop over all channels in the code that uses target->ch. - Set the SRP_MULTICHAN_MULTI flag during login for the creation of the second and subsequent channels. - RDMA completion vectors are chosen such that RDMA completion interrupts are handled by the CPU socket that submitted the I/O request. As one can see in this patch it has been assumed if a system contains n CPU sockets and m RDMA completion vectors have been assigned to an RDMA HCA that IRQ affinity has been configured such that completion vectors [i*m/n..(i+1)*m/n) are bound to CPU socket i with 0 <= i < n. - Modify srp_free_ch_ib() and srp_free_req_data() such that it becomes safe to invoke these functions after the corresponding allocation function failed. - Add a ch_count sysfs attribute per target port. Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> Cc: Sagi Grimberg <sagig@xxxxxxxxxxxx> Cc: Sebastian Parschauer <sebastian.riemer@xxxxxxxxxxxxxxxx>
<SNIP>
spin_lock_irqsave(&ch->lock, flags); ch->req_lim += be32_to_cpu(rsp->req_lim_delta); @@ -1906,7 +1970,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) goto err;
Bart, Any chance you can share some perf output on this code? I'm interested of knowing the contention on target->lock that is still taken on the IO path across channels. Can we think on how to avoid it? Also would like to understand the where did the bottleneck transition. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html