Hang in iscsit_access_np() related to tpg->np_login_sem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

I'd like to inquire some info related to the following patch:
https://www.spinics.net/lists/target-devel/msg18875.html

We've been hitting a similar issue in the production environments
of our customers leading to the same symptoms. We get constant
"iSCSI Login timeout on Network Portal 0.0.0.0:3260" messages
because the iSCSI Target login thread will wait on the np_login_sem
semaphore until it gets interrupted by the timer timeout. Here is our
stack trace of the thread waiting:

0xffff8bdf62f2ac80 INTERRUPTIBLE         1
                  __schedule+0x2c1
                  schedule+0x33
                  schedule_timeout+0x205
                  __down_interruptible+0xbb
                  down_interruptible+0x4b
                  iscsit_access_np+0x5a
                  iscsi_target_locate_portal+0x429
                  __iscsi_target_login_thread+0x332
                  iscsi_target_login_thread+0x6f3
                  kthread+0x120
                  ret_from_fork+0x1f

During that time there is no other login or login-related thread which
leads us to believe that another thread probably got the semaphore
but never actually released it.

Looking through the login code it seems like there are two functions that
are expected to call up() on that semaphore by calling iscsit_deaccess_np():

A] __iscsi_target_login_thread(): This is the same thread that acquired
    the semaphore (by calling iscsit_access_np()).
B] iscsi_target_do_login_rx(): This is a delayed worker thread spawned
    by the thread in [A]

Looking at both of those codepaths it seems like there is one case for each
path that we never call iscsit_deaccess_np() to release the semaphore.

For [A] that is if iscsi_target_start_negotiation() returns 0 towards the
end of that function.

For [B] that is if iscsi_target_do_login() returns 0 AND
iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_WRITE_ACTIVE)
returns 0.

Since we have no expertise in this part of the kernel I wanted to ask you
all, are the two above scenarios expected to not release the semaphore
on purpose or is any of them a bug? If they are not bugs, where is the
semaphore expected to be released?

Any explanation or insight will be very appreciated.

Regards,
Serapheim



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux