On Thursday, October 10/20/16, 2016 at 14:04:34 +0530, Sagi Grimberg wrote: > Hey Jason, > > >> 1) we believe the iSER + RW API correctly sizes the SQ, yet we're > seeing SQ > >> overflows. So the SQ sizing needs more investigation. > > > > NFS had this sort of problem - in that case it was because the code > > was assuming that a RQ completion implied SQ space - that is not > > legal, only direct completions from SQ WCs can guide available space > > in the SQ.. > > Its not the same problem. iser-target does not tie SQ and RQ spaces. > The origin here is the difference between IB/RoCE and iWARP and the > chelsio HW that makes it hard to predict the SQ correct size. > > iWARP needs extra registration for rdma reads and the chelsio device > seems to be limited in the number of pages per registration so this > configuration will need a larger send queue. > > Another problem is that we don't have a correct retry flow. > > Hopefully we can address that in the RW API which is designed to hide > these details from the ULP... Hi Sagi, Here is what our further analysis of SQ dump at the time of overflow says: RDMA read/write API is creating long chains (32 WRs) to handle large ISCSI READs. For Writing iscsi default block size of 512KB data, iw_cxgb4's max number of sge advertised is 4 page ~ 16KB for write, needs WR chain of 32 WRs (another possible factor is they all are unsignalled WRs and are completed only after next signalled WR) But apparantly rdma_rw_init_qp() assumes that any given IO will take only 1 WRITE WR to convey the data. This evidently is incorrect and rdma_rw_init_qp() needs to factor and size the queue based on max_sge of device for write and read and the sg_tablesize for which rdma read/write is used for, like ISCSI_ISER_MAX_SG_TABLESIZE of initiator. If above analysis is correct, please suggest how could this be fixed? Further, using MRs for rdma WRITE by using rdma_wr_force_mr = 1 module parameter of ib_core avoids SQ overflow by registering a single REG_MR and using that MR for a single WRITE WR. So a rdma-rw IO chain of say 32 WRITE WRs, becomes just 3 WRS: REG_MR + WRITE + INV_MR as max_fast_reg_page_list_len of iw_cxgb4 is 128 page. (By default force_mr is not set and iw_cxgb4 could only use MR for rdma READs only as per rdma_rw_io_needs_mr() if force_mr isnt set) >From this is there any possibility that we could use MR if the write WR chain exceeds a certain number? Thanks for your time! -Bharat. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html