RE: [PATCH for-rc] IB/isert: Fix hang in iscsit_wait_for_tag

"Saleem, Shiraz" <shiraz.saleem@xxxxxxxxx> · Tue, 7 Mar 2023 00:09:17 +0000

> Subject: Re: [PATCH for-rc] IB/isert: Fix hang in iscsit_wait_for_tag
> 
> 
> 
> On 1/30/23 20:22, Devale, Sindhu wrote:
> >
> >
> >> Subject: Re: [PATCH for-rc] IB/isert: Fix hang in iscsit_wait_for_tag
> >>
> >>
> >>> From: Mustafa Ismail <mustafa.ismail@xxxxxxxxx>
> >>>
> >>> Running fio can occasionally cause a hang when sbitmap_queue_get()
> >>> fails to return a tag in iscsit_allocate_cmd() and
> >>> iscsit_wait_for_tag() is called and will never return from the
> >>> schedule(). This is because the polling thread of the CQ is
> >>> suspended, and will not poll for a SQ completion which would free up a tag.
> >>> Fix this by creating a separate CQ for the SQ so that send
> >>> completions are processed on a separate thread and are not blocked
> >>> when the RQ CQ is stalled.
> >>>
> >>> Fixes: 10e9cbb6b531 ("scsi: target: Convert target drivers to use
> >>> sbitmap")
> >>
> >> Is this the real offending commit? What prevented this from happening
> >> before?
> >
> > Maybe going to a global bitmap instead of per cpu ida makes it less likely to
> occur.
> > Going to single CQ maybe the real root cause in this
> > commit:6f0fae3d7797("iser-target: Use single CQ for TX and RX")
> 
> Yes this is more likely.
> 
> >
> >>> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx>
> >>> Signed-off-by: Mustafa Ismail <mustafa.ismail@xxxxxxxxx>
> >>> Signed-off-by: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
> >>> ---
> >>>    drivers/infiniband/ulp/isert/ib_isert.c | 33
> >>> +++++++++++++++++++++++--
> >> --------
> >>>    drivers/infiniband/ulp/isert/ib_isert.h |  3 ++-
> >>>    2 files changed, 25 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> >>> b/drivers/infiniband/ulp/isert/ib_isert.c
> >>> index 7540488..f827b91 100644
> >>> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> >>> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> >>> @@ -109,19 +109,27 @@ static int isert_sg_tablesize_set(const char
> >>> *val,
> >> const struct kernel_param *kp
> >>>    	struct ib_qp_init_attr attr;
> >>>    	int ret, factor;
> >>>
> >>> -	isert_conn->cq = ib_cq_pool_get(ib_dev, cq_size, -1,
> >> IB_POLL_WORKQUEUE);
> >>> -	if (IS_ERR(isert_conn->cq)) {
> >>> -		isert_err("Unable to allocate cq\n");
> >>> -		ret = PTR_ERR(isert_conn->cq);
> >>> +	isert_conn->snd_cq = ib_cq_pool_get(ib_dev, cq_size, -1,
> >>> +					    IB_POLL_WORKQUEUE);
> >>> +	if (IS_ERR(isert_conn->snd_cq)) {
> >>> +		isert_err("Unable to allocate send cq\n");
> >>> +		ret = PTR_ERR(isert_conn->snd_cq);
> >>>    		return ERR_PTR(ret);
> >>>    	}
> >>> +	isert_conn->rcv_cq = ib_cq_pool_get(ib_dev, cq_size, -1,
> >>> +					    IB_POLL_WORKQUEUE);
> >>> +	if (IS_ERR(isert_conn->rcv_cq)) {
> >>> +		isert_err("Unable to allocate receive cq\n");
> >>> +		ret = PTR_ERR(isert_conn->rcv_cq);
> >>> +		goto create_cq_err;
> >>> +	}
> >>
> >> Does this have any noticeable performance implications?
> >
> > Initial testing seems to indicate this change causes significant performance
> variability specifically only with 2K Writes.
> > We suspect that may be due an unfortunate vector placement where the
> snd_cq and rcv_cq are on different numa nodes.
> > We can, in the patch, alter the second CQ creation to pass comp_vector to
> insure they are hinted to the same affinity.
> 
> Even so, still there are now two competing threads for completion processing.
> 
> >
> >> Also I wander if there are any other assumptions in the code for
> >> having a single context processing completions...
> >
> > We don't see any.
> >
> >> It'd be much easier if iscsi_allocate_cmd could accept a timeout to fail...
> >>
> >> CCing target-devel and Mike.
> >
> > Do you mean add a timeout to the wait or removing the call
> to iscsit_wait_for_tag() iscsit_allocate_cmd()?
> 
> Looking at the code, passing it TASK_RUNNING will make it fail if there no
> available tag (and hence drop the received command, let the initiator retry). But I
> also think that isert may need a deeper default queue depth...

Hi Sagi -

Mustafa reports - "The problem is not easily reproduced, so I reduce the amount of map_tags allocated when I testing a potential fix. Passing TASK_RUNNING and I got the following call trace:

[  220.131709] isert: isert_allocate_cmd: Unable to allocate iscsit_cmd + isert_cmd
[  220.131712] isert: isert_allocate_cmd: Unable to allocate iscsit_cmd + isert_cmd
[  280.862544] ABORT_TASK: Found referenced iSCSI task_tag: 70
[  313.265156] iSCSI Login timeout on Network Portal 5.1.1.21:3260
[  334.769268] INFO: task kworker/32:3:1285 blocked for more than 30 seconds.
[  334.769272]       Tainted: G           OE      6.2.0-rc3 #6
[  334.769274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  334.769275] task:kworker/32:3    state:D stack:0     pid:1285  ppid:2      flags:0x00004000
[  334.769279] Workqueue: events target_tmr_work [target_core_mod]
[  334.769307] Call Trace:
[  334.769308]  <TASK>
[  334.769310]  __schedule+0x318/0xa30
[  334.769316]  ? _prb_read_valid+0x22e/0x2b0
[  334.769319]  ? __pfx_schedule_timeout+0x10/0x10
[  334.769322]  ? __wait_for_common+0xd3/0x1e0
[  334.769323]  schedule+0x57/0xd0
[  334.769325]  schedule_timeout+0x273/0x320
[  334.769327]  ? __irq_work_queue_local+0x39/0x80
[  334.769330]  ? irq_work_queue+0x3f/0x60
[  334.769332]  ? __pfx_schedule_timeout+0x10/0x10
[  334.769333]  __wait_for_common+0xf9/0x1e0
[  334.769335]  target_put_cmd_and_wait+0x59/0x80 [target_core_mod]
[  334.769351]  core_tmr_abort_task.cold.8+0x187/0x202 [target_core_mod]
[  334.769369]  target_tmr_work+0xa1/0x110 [target_core_mod]
[  334.769384]  process_one_work+0x1b0/0x390
[  334.769387]  worker_thread+0x40/0x380
[  334.769389]  ? __pfx_worker_thread+0x10/0x10
[  334.769391]  kthread+0xfa/0x120
[  334.769393]  ? __pfx_kthread+0x10/0x10
[  334.769395]  ret_from_fork+0x29/0x50
[  334.769399]  </TASK>
[  334.769442] INFO: task iscsi_np:5337 blocked for more than 30 seconds.
[  334.769444]       Tainted: G           OE      6.2.0-rc3 #6
[  334.769444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  334.769445] task:iscsi_np        state:D stack:0     pid:5337  ppid:2      flags:0x00004004
[  334.769447] Call Trace:
[  334.769447]  <TASK>
[  334.769448]  __schedule+0x318/0xa30
[  334.769451]  ? __pfx_schedule_timeout+0x10/0x10
[  334.769453]  ? __wait_for_common+0xd3/0x1e0
[  334.769454]  schedule+0x57/0xd0
[  334.769456]  schedule_timeout+0x273/0x320
[  334.769459]  ? iscsi_update_param_value+0x27/0x70 [iscsi_target_mod]
[  334.769476]  ? __kmalloc_node_track_caller+0x52/0x130
[  334.769478]  ? __pfx_schedule_timeout+0x10/0x10
[  334.769480]  __wait_for_common+0xf9/0x1e0
[  334.769481]  iscsi_check_for_session_reinstatement+0x1e8/0x280 [iscsi_target_mod]
[  334.769496]  iscsi_target_do_login+0x23b/0x570 [iscsi_target_mod]
[  334.769508]  iscsi_target_start_negotiation+0x55/0xc0 [iscsi_target_mod]
[  334.769519]  iscsi_target_login_thread+0x675/0xeb0 [iscsi_target_mod]
[  334.769531]  ? __pfx_iscsi_target_login_thread+0x10/0x10 [iscsi_target_mod]
[  334.769541]  kthread+0xfa/0x120
[  334.769543]  ? __pfx_kthread+0x10/0x10
[  334.769544]  ret_from_fork+0x29/0x50
[  334.769547]  </TASK>

[  185.734571] isert: isert_allocate_cmd: Unable to allocate iscsit_cmd + isert_cmd
[  246.032360] ABORT_TASK: Found referenced iSCSI task_tag: 75
[  278.442726] iSCSI Login timeout on Network Portal 5.1.1.21:3260

By the way increasing tag_num in iscsi_target_locate_portal() will also avoid the issue"

Any thoughts on what could be causing this hang?

Shiraz