Re: [PATCH for-rc] IB/isert: Fix hang in iscsit_wait_for_tag

Mike Christie <michael.christie@xxxxxxxxxx> · Tue, 7 Mar 2023 11:05:07 -0600

On 3/7/23 5:47 AM, Sagi Grimberg wrote:
>> [  220.131709] isert: isert_allocate_cmd: Unable to allocate iscsit_cmd + isert_cmd
>> [  220.131712] isert: isert_allocate_cmd: Unable to allocate iscsit_cmd + isert_cmd
>> [  280.862544] ABORT_TASK: Found referenced iSCSI task_tag: 70
>> [  313.265156] iSCSI Login timeout on Network Portal 5.1.1.21:3260
>> [  334.769268] INFO: task kworker/32:3:1285 blocked for more than 30 seconds.
>> [  334.769272]       Tainted: G           OE      6.2.0-rc3 #6
>> [  334.769274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  334.769275] task:kworker/32:3    state:D stack:0     pid:1285  ppid:2      flags:0x00004000
>> [  334.769279] Workqueue: events target_tmr_work [target_core_mod]
>> [  334.769307] Call Trace:
>> [  334.769308]  <TASK>
>> [  334.769310]  __schedule+0x318/0xa30
>> [  334.769316]  ? _prb_read_valid+0x22e/0x2b0
>> [  334.769319]  ? __pfx_schedule_timeout+0x10/0x10
>> [  334.769322]  ? __wait_for_common+0xd3/0x1e0
>> [  334.769323]  schedule+0x57/0xd0
>> [  334.769325]  schedule_timeout+0x273/0x320
>> [  334.769327]  ? __irq_work_queue_local+0x39/0x80
>> [  334.769330]  ? irq_work_queue+0x3f/0x60
>> [  334.769332]  ? __pfx_schedule_timeout+0x10/0x10
>> [  334.769333]  __wait_for_common+0xf9/0x1e0
>> [  334.769335]  target_put_cmd_and_wait+0x59/0x80 [target_core_mod]
>> [  334.769351]  core_tmr_abort_task.cold.8+0x187/0x202 [target_core_mod]
>> [  334.769369]  target_tmr_work+0xa1/0x110 [target_core_mod]
>> [  334.769384]  process_one_work+0x1b0/0x390
>> [  334.769387]  worker_thread+0x40/0x380
>> [  334.769389]  ? __pfx_worker_thread+0x10/0x10
>> [  334.769391]  kthread+0xfa/0x120
>> [  334.769393]  ? __pfx_kthread+0x10/0x10
>> [  334.769395]  ret_from_fork+0x29/0x50
>> [  334.769399]  </TASK>
>> [  334.769442] INFO: task iscsi_np:5337 blocked for more than 30 seconds.
>> [  334.769444]       Tainted: G           OE      6.2.0-rc3 #6
>> [  334.769444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  334.769445] task:iscsi_np        state:D stack:0     pid:5337  ppid:2      flags:0x00004004
>> [  334.769447] Call Trace:
>> [  334.769447]  <TASK>
>> [  334.769448]  __schedule+0x318/0xa30
>> [  334.769451]  ? __pfx_schedule_timeout+0x10/0x10
>> [  334.769453]  ? __wait_for_common+0xd3/0x1e0
>> [  334.769454]  schedule+0x57/0xd0
>> [  334.769456]  schedule_timeout+0x273/0x320
>> [  334.769459]  ? iscsi_update_param_value+0x27/0x70 [iscsi_target_mod]
>> [  334.769476]  ? __kmalloc_node_track_caller+0x52/0x130
>> [  334.769478]  ? __pfx_schedule_timeout+0x10/0x10
>> [  334.769480]  __wait_for_common+0xf9/0x1e0
>> [  334.769481]  iscsi_check_for_session_reinstatement+0x1e8/0x280 [iscsi_target_mod]

The hang here might be this issue:

https://lore.kernel.org/linux-scsi/c1a395a3-74e2-c77f-c8e6-1cade30dfac6@xxxxxxxxxx/T/#mdb29702f7c345eb7e3631d58e3ac7fac26e15fee

That version had some bugs, so I'm working on a new version.

>> [  334.769496]  iscsi_target_do_login+0x23b/0x570 [iscsi_target_mod]
>> [  334.769508]  iscsi_target_start_negotiation+0x55/0xc0 [iscsi_target_mod]
>> [  334.769519]  iscsi_target_login_thread+0x675/0xeb0 [iscsi_target_mod]
>> [  334.769531]  ? __pfx_iscsi_target_login_thread+0x10/0x10 [iscsi_target_mod]
>> [  334.769541]  kthread+0xfa/0x120
>> [  334.769543]  ? __pfx_kthread+0x10/0x10
>> [  334.769544]  ret_from_fork+0x29/0x50
>> [  334.769547]  </TASK>
>>
>>
>> [  185.734571] isert: isert_allocate_cmd: Unable to allocate iscsit_cmd + isert_cmd
>> [  246.032360] ABORT_TASK: Found referenced iSCSI task_tag: 75

Or, if there is only one session, then LIO might be waiting for commands to complete
before allowing a new login.

Or, it could be a combo of both.

>> [  278.442726] iSCSI Login timeout on Network Portal 5.1.1.21:3260
>>
>>
>> By the way increasing tag_num in iscsi_target_locate_portal() will also avoid the issue"
>>
>> Any thoughts on what could be causing this hang?
> 
> I know that Mike just did a set of fixes on the session teardown area...
> Perhaps you should try with the patchset "target: TMF and recovery
> fixes" applied?