RE: [PATCH for-rc] IB/isert: Fix hang in iscsit_wait_for_tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> Subject: Re: [PATCH for-rc] IB/isert: Fix hang in iscsit_wait_for_tag
> 
> 
> > From: Mustafa Ismail <mustafa.ismail@xxxxxxxxx>
> >
> > Running fio can occasionally cause a hang when sbitmap_queue_get()
> > fails to return a tag in iscsit_allocate_cmd() and
> > iscsit_wait_for_tag() is called and will never return from the
> > schedule(). This is because the polling thread of the CQ is suspended,
> > and will not poll for a SQ completion which would free up a tag.
> > Fix this by creating a separate CQ for the SQ so that send completions
> > are processed on a separate thread and are not blocked when the RQ CQ
> > is stalled.
> >
> > Fixes: 10e9cbb6b531 ("scsi: target: Convert target drivers to use
> > sbitmap")
> 
> Is this the real offending commit? What prevented this from happening
> before?

Maybe going to a global bitmap instead of per cpu ida makes it less likely to occur.
Going to single CQ maybe the real root cause in this commit:6f0fae3d7797("iser-target: Use single CQ for TX and RX")

> > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx>
> > Signed-off-by: Mustafa Ismail <mustafa.ismail@xxxxxxxxx>
> > Signed-off-by: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
> > ---
> >   drivers/infiniband/ulp/isert/ib_isert.c | 33 +++++++++++++++++++++++--
> --------
> >   drivers/infiniband/ulp/isert/ib_isert.h |  3 ++-
> >   2 files changed, 25 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> > b/drivers/infiniband/ulp/isert/ib_isert.c
> > index 7540488..f827b91 100644
> > --- a/drivers/infiniband/ulp/isert/ib_isert.c
> > +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> > @@ -109,19 +109,27 @@ static int isert_sg_tablesize_set(const char *val,
> const struct kernel_param *kp
> >   	struct ib_qp_init_attr attr;
> >   	int ret, factor;
> >
> > -	isert_conn->cq = ib_cq_pool_get(ib_dev, cq_size, -1,
> IB_POLL_WORKQUEUE);
> > -	if (IS_ERR(isert_conn->cq)) {
> > -		isert_err("Unable to allocate cq\n");
> > -		ret = PTR_ERR(isert_conn->cq);
> > +	isert_conn->snd_cq = ib_cq_pool_get(ib_dev, cq_size, -1,
> > +					    IB_POLL_WORKQUEUE);
> > +	if (IS_ERR(isert_conn->snd_cq)) {
> > +		isert_err("Unable to allocate send cq\n");
> > +		ret = PTR_ERR(isert_conn->snd_cq);
> >   		return ERR_PTR(ret);
> >   	}
> > +	isert_conn->rcv_cq = ib_cq_pool_get(ib_dev, cq_size, -1,
> > +					    IB_POLL_WORKQUEUE);
> > +	if (IS_ERR(isert_conn->rcv_cq)) {
> > +		isert_err("Unable to allocate receive cq\n");
> > +		ret = PTR_ERR(isert_conn->rcv_cq);
> > +		goto create_cq_err;
> > +	}
> 
> Does this have any noticeable performance implications?

Initial testing seems to indicate this change causes significant performance variability specifically only with 2K Writes.
We suspect that may be due an unfortunate vector placement where the snd_cq and rcv_cq are on different numa nodes. 
We can, in the patch, alter the second CQ creation to pass comp_vector to insure they are hinted to the same affinity.

> Also I wander if there are any other assumptions in the code for having a
> single context processing completions...

We don't see any.

> It'd be much easier if iscsi_allocate_cmd could accept a timeout to fail...
> 
> CCing target-devel and Mike.

Do you mean add a timeout to the wait or removing the call to iscsit_wait_for_tag() iscsit_allocate_cmd()?




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux