Re: [PATCH] iscsi-target: fix hang in iscsit_access_np() when getting tpg->np_login_sem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2020/9/2 10:57 AM, Michael Christie wrote:


On Jul 29, 2020, at 8:03 AM, Hou Pu <houpu@xxxxxxxxxxxxx> wrote:

The iscsi target login thread might stuck in following stack:

cat /proc/`pidof iscsi_np`/stack
[<0>] down_interruptible+0x42/0x50
[<0>] iscsit_access_np+0xe3/0x167
[<0>] iscsi_target_locate_portal+0x695/0x8ac
[<0>] __iscsi_target_login_thread+0x855/0xb82
[<0>] iscsi_target_login_thread+0x2f/0x5a
[<0>] kthread+0xfa/0x130
[<0>] ret_from_fork+0x1f/0x30

This could be reproduced by following steps:
1. Initiator A try to login iqn1-tpg1 on port 3260. After finishing
   PDU exchange in the login thread and before the negotiation is
   finished, at this time the network link is down. In a production
   environment, this could happen. I could emulated it by bring
   the network card down in the initiator node by ifconfig eth0 down.
   (Now A could never finish this login. And tpg->np_login_sem is
   hold by it).
2. Initiator B try to login iqn2-tpg1 on port 3260. After finishing
   PDU exchange in the login thread. The target expect to process
   remaining login PDUs in workqueue context.
3. Initiator A' try to re-login to iqn1-tpg1 on port 3260 from
   a new socket. It will wait for tpg->np_login_sem with
   np->np_login_timer loaded to wait for at most 15 second.
   (Because the lock is held by A. A never gets a change to
   release tpg->np_login_sem. so A' should finally get timeout).
4. Before A' got timeout. Initiator B gets negotiation failed and
   calls iscsi_target_login_drop()->iscsi_target_login_sess_out().
   The np->np_login_timer is canceled. And initiator A' will hang
   there forever. Because A' is now in the login thread. All other
   login requests could not be serviced.

iqn1 and iqn1 are different targets right? It’s not clear to me how when initiator B fails negotiation that it cancels the timer for the portal under a different iqn/target.

iqn1-tpg1 in step1 and step3 are same one. (same target volume)
iqn2-tpg1 in step2 is a different volume on the same host.
The configuration likes below:

iqn1-tpg1:
root@storageXXX:/sys/kernel/config/target/iscsi# ls iqn.2010-10.org.openstack\:volume-00e50deb-5296-4f18-xxxx-106f96a880c8/tpgt_1/np/
10.129.77.16:3260

iqn2-tpg1:
root@storageXXX:/sys/kernel/config/target/iscsi# ls iqn.2010-10.org.openstack\:volume-86af15c6-c529-4715-xxxx-3c9ca068635d/tpgt_1/np/
10.129.77.16:3260

(I could provide more is needed)


Is iqn2-tpg1->np1 a different struct than iqn1-tpg1-np1? I mean iscsit_get_tpg_from_np would return a different np struct for initiator B and for A?


iscsit_get_tpg_from_np() returned different struct iscsi_portal_group
for initiator A and B. But struct iscsi_np is shared by them.
Because they have the same portal(ip address and port).


Thanks,
Hou













[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux