Re: SQ overflow seen running isert traffic with high block sizes

Shiraz Saleem <shiraz.saleem@xxxxxxxxx> · Tue, 30 Jan 2018 10:30:41 -0600

On Mon, Jan 29, 2018 at 09:17:02PM +0200, Sagi Grimberg wrote:
> 
> > > > First, would it be helpful to limit maximum payload size per I/O for
> > > > consumers based on number of iser-target sq hw sges..?
> > > 
> > > I don't think you need to limit the maximum payload, but instead
> > > initialize the max_wr to be based on the number of supported SGEs
> > > Instead of what is there today:
> > > #define ISERT_QP_MAX_REQ_DTOS   (ISCSI_DEF_XMIT_CMDS_MAX +    \
> > >                                  ISERT_MAX_TX_MISC_PDUS  + \
> > >                                  ISERT_MAX_RX_MISC_PDUS)
> > > Add the maximum number of WQEs per command,
> > > The calculation of number of WQEs per command needs to be something like
> > > "MAX_TRANSFER_SIZE/(numSges*PAGE_SIZE)".
> > > 
> > 
> > Makes sense, MAX_TRANSFER_SIZE would be defined globally by iser-target,
> > right..?
> > 
> > Btw, I'm not sure how this effects usage of ISER_MAX_TX_CQ_LEN +
> > ISER_MAX_CQ_LEN, which currently depend on ISERT_QP_MAX_REQ_DTOS..
> > 
> > Sagi, what are your thoughts wrt changing attr.cap.max_send_wr at
> > runtime vs. exposing a smaller max_data_sg_nents=32 for ib_devices with
> > limited attr.cap.max_send_sge..?
> 
> Sorry for the late reply,
> 
> Can we go back and understand why do we need to limit isert transfer
> size? I would suggest that we handle queue-full scenarios instead
> of limiting the transfered payload size.
> 
> From the trace Shiraz sent, it looks that:
> a) we are too chatty when failing to post a wr on a queue-pair
> (something that can happen by design), and
> b) isert escalates to terminating the connection which means we
> screwed up handling it.
> 
> Shiraz, can you explain these messages:
> [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id = 3
> [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id = 3

These are some device specific Asynchronous Event logging I turned on.
It indicates the QP received a FIN while in RTS and eventually was moved
to CLOSED state.

> Who is initiating the connection teardown? the initiator or the target?
> (looks like the initiator gave up on iscsi ping timeout expiration)
> 

Initiator

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html