Hello I am seeing and issue with 100Gbit EDR Infiniband (mlx5_ib and ConnectX-4) and connecting to high speed arrays when we tune the ib_srp parameters to maximum allowed values. The tuning is being done to maximize performance using: options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048 We get into a situation where in srp_queuecommand we fail the srp_map_data(). [ 353.811594] scsi host4: ib_srp: Failed to map data (-5) [ 353.811619] scsi host4: Could not fit S/G list into SRP_CMD
I'd say that's an unusual limit to hit? What is your workload? with CX4 (fr by default) you'd need a *very* unaligned SG layout or a huge transfer size (huge).
On the array [ 6097.205716] ib_srpt IB send queue full (needed 68) [ 6097.233325] ib_srpt srpt_xfer_data[2731] queue full -- ret=-12
Is this upstream srpt? And if all the srp commands contain ~255 (or even ~50) descriptors then I'm not at all surprised you get queue overrun. Each command includes num_sg_entries worth of rdma posts... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html