Hi Steve, Potnuri, & Co, On Tue, 2016-10-18 at 09:34 -0500, Steve Wise wrote: > > > > > I tried out this change and it works fine with iwarp. I dont see SQ > > > overflow. Apparently we have increased the sq too big to overflow. I am > going > > > to let it run with higher workloads for longer time, to see if it holds > good. > > > > Actually on second thought, this patch is an overkill. Effectively we > > now set: > > > > MAX_CMD=266 > > and max_rdma_ctx=128 so together we take 394 which seems to too much. > > > > If we go by the scheme of 1 rdma + 1 send for each IO we need: > > - 128 sends > > - 128 rdmas > > - 10 miscs > > > > so this gives 266. > > > > Perhaps this is due to the fact that iWARP needs to register memory for > > rdma reads as well? (and also rdma writes > 128k for chelsio HW right?) > > > > iWARP definitely needs to register memory for the target of reads, due to > REMOTE_WRITE requirement for the protocol. The source of a write doesn't need > to register memory, but the SGE depth can cause multiple WRITE WRs to be > required to service the IO. And in theory there should be some threshold where > it might be better performance-wise to do a memory register + 1 WRITE vs X > WRITEs. > > As you mentioned, the RW API should account for this, but perhaps it is still > off some. Bharat, have a look into the RDMA-RW API and let us see if we can > figure out if the additional SQ depth it adds is sufficient. > > > What is the workload you are running? with immediatedata enabled you > > should issue reg+rdma_read+send only for writes > 8k. > > > > Does this happen when you run only reads for example? > > > > I guess its time to get the sq accounting into shape... > > So to sum up - 2 issues: > > 1) we believe the iSER + RW API correctly sizes the SQ, yet we're seeing SQ > overflows. So the SQ sizing needs more investigation. > > 2) if the SQ is full, then the iSER/target code is supposed to resubmit. And > apparently that isn't working. > For #2, target-core expects -ENOMEM or -EAGAIN return from fabric driver callbacks to signal internal queue-full retry logic. Otherwise, the extra se_cmd->cmd_kref response SCF_ACK_KREF is leaked until session shutdown and/or reinstatement occurs. AFAICT, Potunri's earlier hung task with v4.8.y + ABORT_TASK is likely the earlier v4.1+ regression: https://github.com/torvalds/linux/commit/527268df31e57cf2b6d417198717c6d6afdb1e3e That said, there is room for improvement in target-core queue-full error signaling, and iscsi-target/iser-target callback error propagation. Sending out a series shortly to address these particular items. Please have a look. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html