> On Jul 29, 2020, at 8:03 AM, Hou Pu <houpu@xxxxxxxxxxxxx> wrote: > > The iscsi target login thread might stuck in following stack: > > cat /proc/`pidof iscsi_np`/stack > [<0>] down_interruptible+0x42/0x50 > [<0>] iscsit_access_np+0xe3/0x167 > [<0>] iscsi_target_locate_portal+0x695/0x8ac > [<0>] __iscsi_target_login_thread+0x855/0xb82 > [<0>] iscsi_target_login_thread+0x2f/0x5a > [<0>] kthread+0xfa/0x130 > [<0>] ret_from_fork+0x1f/0x30 > > This could be reproduced by following steps: > 1. Initiator A try to login iqn1-tpg1 on port 3260. After finishing > PDU exchange in the login thread and before the negotiation is > finished, at this time the network link is down. In a production > environment, this could happen. I could emulated it by bring > the network card down in the initiator node by ifconfig eth0 down. > (Now A could never finish this login. And tpg->np_login_sem is > hold by it). > 2. Initiator B try to login iqn2-tpg1 on port 3260. After finishing > PDU exchange in the login thread. The target expect to process > remaining login PDUs in workqueue context. > 3. Initiator A' try to re-login to iqn1-tpg1 on port 3260 from > a new socket. It will wait for tpg->np_login_sem with > np->np_login_timer loaded to wait for at most 15 second. > (Because the lock is held by A. A never gets a change to > release tpg->np_login_sem. so A' should finally get timeout). > 4. Before A' got timeout. Initiator B gets negotiation failed and > calls iscsi_target_login_drop()->iscsi_target_login_sess_out(). > The np->np_login_timer is canceled. And initiator A' will hang > there forever. Because A' is now in the login thread. All other > login requests could not be serviced. iqn1 and iqn1 are different targets right? It’s not clear to me how when initiator B fails negotiation that it cancels the timer for the portal under a different iqn/target. Is iqn2-tpg1->np1 a different struct than iqn1-tpg1-np1? I mean iscsit_get_tpg_from_np would return a different np struct for initiator B and for A?