Re: Leak of tpg->np_login_sem, possibly due to connection interruptions

Paul Dagnelie <paul.dagnelie@xxxxxxxxxxxx> · Fri, 9 Aug 2024 00:25:45 +0000

Hey all,

I've done some more digging into this. I still would really appreciate any advice that folks have on how to root cause and fix this bug.

I have access to a core file from a system that was taken while the system was suffering from this issue. In that core dump, we can see that the thread in __transport_wait_for_tasks is waiting for the LUN_RESET command to complete. This lead me to realize that in the syslog output, the LUN_RESET message that occured when the issue first happened is different from the other LUN_RESET commands I see: We never get the "LUN_RESET: TMR for [iblock] Complete" message. That lead me to look for the thread that is blocked in processing the LUN_RESET command. That thread's stack trace looks like this:

0xffff9416b0fa2080 UNINTERRUPTIBLE       4
                  __schedule+0x2bd
                  ...
                  target_put_cmd_and_wait+0x5a
                  core_tmr_drain_state_list
                  core_tmr_lun_reset+0x4e3
                  target_tmr_work+0xd1
                  ...

The command *that* thread is waiting for has a t_state of TRANSPORT_WRITE_PENDING, and it's transport_state is CMD_T_ABORTED. However, it still has a cmd_kref value of 2, which is why the LUN_RESET command can't proceed. It looks like it's a write command (execute_cmd is sbc_execute_rw and data_direction is DMA_TO_DEVICE). I'm still investigating further to try to understand how this state of offairs could occur. Any insight or information anyone could provide would be greatly appreciated.

________________________________________
From: Paul Dagnelie <paul.dagnelie@xxxxxxxxxxxx>
Sent: Wednesday, August 7, 2024 4:17 PM
To: target-devel@xxxxxxxxxxxxxxx <target-devel@xxxxxxxxxxxxxxx>
Cc: mlombard@xxxxxxxxxx <mlombard@xxxxxxxxxx>; David Mendez <david.mendez@xxxxxxxxxxxx>
Subject: Re: Leak of tpg->np_login_sem, possibly due to connection interruptions

I think I've come to the conclusion that I initially misdiagnosed the precise issue here. It looks like what's actually happening is that the  iscsit_close_connection call is hanging indefinitely.

One thing that confused me about my findings is the flow where connection reinstatement was triggered. The "iSCSI security negotiation completed successfully" line indicates that we're in iscsi_target_do_authentication, which means we're in iscsi_target_handle_csg_zero. However, after the call to handle_csg_zero, we should call into iscsi_target_do_tx_login_io, which always prints a line about "Sending Login Response". I don't see that in my logs. Thus, maybe what's actually happening here is that the `iscsit_cause_connection_reinstatement` blocks forever. That is supported because we have a (somewhat mangled) stack trace of a thread doing that, sitting around for the last several hours:

0xffff909731e2c100 UNINTERRUPTIBLE       1
                  __schedule+0x2bd
                  __schedule+0x2bd
                  schedule+0x69
                  schedule_timeout+0x208
                  wait_for_completion+0x8e
                  wait_for_completion+0x8e
                  wait_for_completion+0x8e
                  wait_for_completion+0x8e
                  iscsit_cause_connection_reinstatement+0x9e
                  iscsit_stop_session+0x101
                  iscsi_check_for_session_reinstatement+0x1e8
                  iscsi_target_check_for_existing_instances+0x21
                  iscsi_target_do_login+0x344
                  iscsi_target_do_login+0x344
                  iscsi_target_do_login+0x344
                  iscsi_target_do_login_rx+0x24b
                  process_one_work+0x222
                  worker_thread+0x4d
                  kthread+0x127
                  ret_from_fork+0x1f

That stack trace indicates that we're sitting in the conn_wait_comp completion, which is triggered from iscsit_close_connection. That thread appears to be sitting in iscsit_free_cmd+0x58, and has been again for some time (I removed some doubled lines this time, number of dupes indicated in brackets):

0xffff9098459d2080 UNINTERRUPTIBLE       1
                  __schedule+0x2bd [2]
                  schedule+0x69
                  schedule_timeout+0x18f
                  wait_for_completion_timeout+0x87 [4]
                  __transport_wait_for_tasks+0xd3 [2]
                  transport_generic_free_cmd+0x11c [2]
                  iscsit_free_cmd+0x58
                  iscsit_close_connection+0x48e [2]
                  iscsit_take_action_for_connection_exit+0x86 [2]
                  iscsi_target_tx_thread+0x1a5
                  kthread+0x127
                  ret_from_fork+0x1f

Doing some reconstruction, it looks like in iscsit_close_connection we called iscsit_release_commands_from_conn, which calls iscsit_free_cmd, which calls transport_generic_free_cmd, which calls target_wait_free_cmd, __transport_wait_for_tasks, and that's where we're hanging waiting for t_transport_stop_comp. We're now well outside of my area of knowledge of how the target code works, but hopefully that can give someone some ideas of what could be causing this issue.

________________________________________
From: Paul Dagnelie
Sent: Wednesday, August 7, 2024 1:34 AM
To: target-devel@xxxxxxxxxxxxxxx <target-devel@xxxxxxxxxxxxxxx>
Cc: mlombard@xxxxxxxxxx <mlombard@xxxxxxxxxx>; David Mendez <david.mendez@xxxxxxxxxxxx>
Subject: Leak of tpg->np_login_sem, possibly due to connection interruptions

Hey all,

I'm investigating an issue that we've been seeing on and off for about a year and a half now. We're running on Linux 5.15 with a couple custom patches, but none that I think would be relevant to this issue. A customer (usually with a somewhat flaky network) will report that they're getting constant "iSCSI Login timeout on Network Portal 0.0.0.0:3260" messages because the iSCSI Target login thread will wait on the np_login_sem  semaphore until it gets interrupted by the timer timeout.

We can see a thread blocked on the semaphore if we look at the stack traces on the system:

0xffff8bdf62f2ac80 INTERRUPTIBLE         1
                  __schedule+0x2c1
                  schedule+0x33
                  schedule_timeout+0x205
                  __down_interruptible+0xbb
                  down_interruptible+0x4b
                  iscsit_access_np+0x5a
                  iscsi_target_locate_portal+0x429
                  __iscsi_target_login_thread+0x332
                  iscsi_target_login_thread+0x6f3
                  kthread+0x120
                  ret_from_fork+0x1f

And using bpftrace, we can see that all calls to iscsit_access_np with one tpg return -1 when the timeout hits.

I've also managed to run some bpftrace when the problem first starts happening. The problem first starts when we get a LUN_RESET (XXX here are redacted, but all the lines have the same iqn in the original logs):

Aug  6 23:24:19 HOSTNAME kernel: [24805.722904] LUN_RESET: TMR caller
fabric: iSCSI initiator port iqn.1991-05.com.XXX:XXX.XXX.XXX
Aug  6 23:24:19 HOSTNAME kernel: [24805.722907] LUN_RESET: TMR starting for [iblock], tas: 1
Aug  6 23:24:19 HOSTNAME kernel: [24805.722919] LUN_RESET: ITT[0x96786300] -  pr_res_key: 0x0000000000000000

Twenty seconds later, we then start doing the iscsi login flow. The bpftrace that shows that an access_np call comes in for that tpg, but no corresponding deaccess_np call ever comes in. This system also has pr_debug printing enabled for various files, which allows us to see that iscsit_take_action_for_connection_exit was called, it looks like because connection reinstatement was triggered:

Aug  6 23:24:39 HOSTNAME kernel: [24825.690817] iSCSI security negotiation completed successfully.
Aug  6 23:24:39 HOSTNAME kernel: [24825.690826] Normal iSCSI Session SID 20 is still active for iqn.1991-05.com.XXX:XXX.XXX.XXX, performing session reinstatement.
Aug  6 23:24:39 HOSTNAME kernel: [24825.690844] Moving to TARG_CONN_STATE_CLEANUP_WAIT.
Aug  6 23:24:39 HOSTNAME kernel: [24825.690847] Performing cleanup for failed iSCSI Connection ID: 1 from iqn.1991-05.com.XXX:XXX.XXX.XXX

As a result, it seems like there is a way for the np_login_sem hold to leak.  It seems like what's happening is that after the LUN_RESET, the session somehow wasn't closed. As a result, the login process decides to try to reinstate the old session, which tries to reinstate the old connection. This kills the tx and rx thread, and one of them invokes iscsit_take_action_for_connection_exit, cancelling the login attempt and killing the delayed worker (who would usually be responsible for calling deaccess_np). As a result, the semaphore is never released.

My current best idea for how to fix this (if it is the issue I think it is) is to somehow detect in the connection exit code that we're holding the semaphore, and forcibly release it in that case. However, I'm not familiar enough with this code path to know if that idea is feasible, or if it would break another valid flow through this code. I would welcome any alternative explanation for the symptoms and logging I've been seeing, as well as any other proposals on a way to fix this. We have a customer system that reliably hits this state within a couple hours after booting, which is causing them a lot of grief.

Thanks,
Paul Dagnelie

This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.