Re: BUG in stress login-logout to multiple IQNs

Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> · Mon, 16 Feb 2015 20:56:16 +0200

On 2/12/2015 10:47 AM, Nicholas A. Bellinger wrote:
On Wed, 2015-02-11 at 10:17 +0200, Sagi Grimberg wrote:
Hey Nic,

So Our QA guys recently stepped on this bug when performing stress
login-logout from a single initiator to 10 targets each exposed over
4 portals, so overall 40 sessions (needless to say we are talking on
iser...). So there are lots of logins in parallel with lots of logouts.

It seems that the connection termination causes iscsi_tx_thread to
access the connection after it is freed or something (list corruption
probably coming from iscsit_handle_immediate_queue or
iscsit_handle_response_queue, and NULL deref coming from
iscsit_take_action_for_connection_exit).

Note, isert_wait_conn waits for session commands and QP flush which is
normally pretty fast, the conn termination is done in a work that waits
for DISCONNECTED event which might take longer (which is why we do it
outside wait_conn context to avoid blocking it).

I didn't get too far with this until now, do you have any idea on what
might have happened?

Mmm, it looks like iscsit_take_action_for_connection_exit() in TX thread
context is calling iscsi_close_connection() after hitting the following
check in iscsi_target_erl0.c:

         if (conn->conn_state == TARG_CONN_STATE_IN_LOGOUT) {
                 spin_unlock_bh(&conn->state_lock);
                 iscsit_close_connection(conn);
                 return;
         }

.. once iscsi_close_connection() has already being called earlier by
iser-target code.

AFAICT, this check is specific to traditional iscsi-target during the
iscsit_handle_logout_cmd() failure case in iscsi_target_rx_opcode().

Here's a quick patch to make this case ISCSI_TCP only.

Thanks Sagi.

--nab

diff --git a/drivers/target/iscsi/iscsi_target_erl0.c b/drivers/target/iscsi/iscsi_target_erl0.c
index bdd8731..1c197ba 100644
--- a/drivers/target/iscsi/iscsi_target_erl0.c
+++ b/drivers/target/iscsi/iscsi_target_erl0.c
@@ -22,6 +22,7 @@
  #include <target/target_core_fabric.h>

  #include <target/iscsi/iscsi_target_core.h>
+#include <target/iscsi/iscsi_transport.h>
  #include "iscsi_target_seq_pdu_list.h"
  #include "iscsi_target_tq.h"
  #include "iscsi_target_erl0.h"
@@ -939,7 +940,8 @@ void iscsit_take_action_for_connection_exit(struct iscsi_conn *conn)

         if (conn->conn_state == TARG_CONN_STATE_IN_LOGOUT) {
                 spin_unlock_bh(&conn->state_lock);
-               iscsit_close_connection(conn);
+               if (conn->conn_transport->transport_type == ISCSI_TCP)
+                       iscsit_close_connection(conn);
                 return;
         }



Don't seem to hit a NULL deref with this one (although I'm not sure what
is the root cause here - might pop up again).

But I do see iscsi_trx thread hang since we can't seem to deallocate
it... kinda hard to understand all the completions/signals.

kernel: INFO: task iscsi_trx:18456 blocked for more than 120 seconds.
kernel:      Tainted: G           OE  3.18.0-rc2+ #28
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
kernel: iscsi_trx       D 0000000000000003     0 18456      2 0x00000000
kernel: ffff880444297c48 0000000000000046 ffff880444297c18 ffff880444294010
kernel: 0000000000011b00 0000000000011b00 ffff880441d88d50 ffff88046fa60190
kernel: ffff880444297c28 ffff8800747f3f00 7fffffffffffffff ffff880444297d08
kernel: Call Trace:
kernel: [<ffffffff815332b4>] schedule+0x24/0x70
kernel: [<ffffffff815359c5>] schedule_timeout+0x1a5/0x210
kernel: [<ffffffff810705f6>] ? ttwu_queue+0xe6/0x100
kernel: [<ffffffff81074821>] ? try_to_wake_up+0x1a1/0x230
kernel: [<ffffffff8107709b>] ? sched_clock_cpu+0xab/0xc0
kernel: [<ffffffff81534396>] wait_for_completion+0xc6/0x100
kernel: [<ffffffff810748b0>] ? try_to_wake_up+0x230/0x230
kernel: [<ffffffffa0367460>] ? iscsi_target_rx_opcode+0x2b0/0x2b0 
[iscsi_target_mod]
kernel: [<ffffffffa0367460>] ? iscsi_target_rx_opcode+0x2b0/0x2b0 
[iscsi_target_mod]
kernel: [<ffffffff81069d89>] kthread_stop+0x49/0x80
kernel: [<ffffffffa034fd56>] iscsi_deallocate_thread_one+0x76/0xc0 
[iscsi_target_mod]
kernel: [<ffffffffa034fdcb>] 
iscsi_deallocate_extra_thread_sets+0x2b/0x80 [iscsi_target_mod]
kernel: [<ffffffffa0350845>] iscsi_rx_thread_pre_handler+0x135/0x170 
[iscsi_target_mod]
kernel: [<ffffffffa03674d4>] iscsi_target_rx_thread+0x74/0x310 
[iscsi_target_mod]
kernel: [<ffffffff81532e85>] ? __schedule+0x355/0x650
kernel: [<ffffffffa0367460>] ? iscsi_target_rx_opcode+0x2b0/0x2b0 
[iscsi_target_mod]
kernel: [<ffffffff8106994e>] kthread+0xce/0xf0
kernel: [<ffffffff81069880>] ? kthread_freezable_should_stop+0x70/0x70
kernel: [<ffffffff815369ac>] ret_from_fork+0x7c/0xb0
kernel: [<ffffffff81069880>] ? kthread_freezable_should_stop+0x70/0x70
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html