Re: BUG in stress login-logout to multiple IQNs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/21/2015 9:54 AM, Nicholas A. Bellinger wrote:
Hi Sagi,

Apologies for the delayed response..  Comments below.

On Sat, 2015-02-14 at 12:21 +0200, Sagi Grimberg wrote:
On 2/12/2015 10:47 AM, Nicholas A. Bellinger wrote:
On Wed, 2015-02-11 at 10:17 +0200, Sagi Grimberg wrote:
Hey Nic,

So Our QA guys recently stepped on this bug when performing stress
login-logout from a single initiator to 10 targets each exposed over
4 portals, so overall 40 sessions (needless to say we are talking on
iser...). So there are lots of logins in parallel with lots of logouts.

It seems that the connection termination causes iscsi_tx_thread to
access the connection after it is freed or something (list corruption
probably coming from iscsit_handle_immediate_queue or
iscsit_handle_response_queue, and NULL deref coming from
iscsit_take_action_for_connection_exit).

Note, isert_wait_conn waits for session commands and QP flush which is
normally pretty fast, the conn termination is done in a work that waits
for DISCONNECTED event which might take longer (which is why we do it
outside wait_conn context to avoid blocking it).

I didn't get too far with this until now, do you have any idea on what
might have happened?

Mmm, it looks like iscsit_take_action_for_connection_exit() in TX thread
context is calling iscsi_close_connection() after hitting the following
check in iscsi_target_erl0.c:

          if (conn->conn_state == TARG_CONN_STATE_IN_LOGOUT) {
                  spin_unlock_bh(&conn->state_lock);
                  iscsit_close_connection(conn);
                  return;
          }

.. once iscsi_close_connection() has already being called earlier by
iser-target code.

Not sure I understand where iscsit_close_connection is called earlier
by iser target. The iser code usually only notifies any problems to the
iscsi layer to do it's thing.

Care to explain how iscsit_close_connection might be called twice?


It appears iscsit_close_connection() is getting invoked first from
iscsi_trx context after isert_cq_comp_err() has previously called
iscsit_cause_connection_reinstatement() to force a connection failure to
occur during explicit logout + ISCSI_LOGOUT_REASON_CLOSE_SESSION
operation.

Yes, that is correct.


You can tell because isert_wait_conn() + isert_wait4cmds() debug output
appears before list_del corruption in iscsi_ttx context, which can only
be invoked via iscsit_close_connection() -> transport->wait_for_conn()
-> isert_wait_conn().

Given the fact that there are lots of connections tearing down in
parallel, I wouldn't rely on the proximity of these debug prints to
the stack trace.


Once iscsi_ttx context runs, it's hitting the TARG_CONN_STATE_IN_LOGOUT
state check in iscsit_take_action_for_connection_exit() and re-invokes
iscsit_close_connection(), after iscsit_logout_closesession() from
isert_rx_completion() context handles REASON_CLOSE_SESSION and changed
connection state to IN_LOGOUT, but before the logout response was posted
and successfully completed in isert_do_control_comp().

iscsit_take_action_for_connection_exit() is invoked both by RX/TX
threads. But only one should get to iscsit_close_connection() since
conn->connection_exit is set under conn->state_lock. I'd say that if
iscsit_close_connection() was invoked twice, the bug is in
iscsit_take_action_for_connection_exit() isn't it?

Coming to think of it, I think this code might be buggy since
iscsit_close_connection() frees the iscsi conn while conn->state_lock
might be dereferenced in iscsit_take_action_for_connection_exit().

I think that conn->state_lock is not sufficient to guarantee mutual
exclusion in iscsit_take_action_for_connection_exit(). We need a
dedicated lock for a TX/RX pair.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux