Hi Sagi, Apologies for the delayed response.. Comments below. On Sat, 2015-02-14 at 12:21 +0200, Sagi Grimberg wrote: > On 2/12/2015 10:47 AM, Nicholas A. Bellinger wrote: > > On Wed, 2015-02-11 at 10:17 +0200, Sagi Grimberg wrote: > >> Hey Nic, > >> > >> So Our QA guys recently stepped on this bug when performing stress > >> login-logout from a single initiator to 10 targets each exposed over > >> 4 portals, so overall 40 sessions (needless to say we are talking on > >> iser...). So there are lots of logins in parallel with lots of logouts. > >> > >> It seems that the connection termination causes iscsi_tx_thread to > >> access the connection after it is freed or something (list corruption > >> probably coming from iscsit_handle_immediate_queue or > >> iscsit_handle_response_queue, and NULL deref coming from > >> iscsit_take_action_for_connection_exit). > >> > >> Note, isert_wait_conn waits for session commands and QP flush which is > >> normally pretty fast, the conn termination is done in a work that waits > >> for DISCONNECTED event which might take longer (which is why we do it > >> outside wait_conn context to avoid blocking it). > >> > >> I didn't get too far with this until now, do you have any idea on what > >> might have happened? > > > > Mmm, it looks like iscsit_take_action_for_connection_exit() in TX thread > > context is calling iscsi_close_connection() after hitting the following > > check in iscsi_target_erl0.c: > > > > if (conn->conn_state == TARG_CONN_STATE_IN_LOGOUT) { > > spin_unlock_bh(&conn->state_lock); > > iscsit_close_connection(conn); > > return; > > } > > > > .. once iscsi_close_connection() has already being called earlier by > > iser-target code. > > Not sure I understand where iscsit_close_connection is called earlier > by iser target. The iser code usually only notifies any problems to the > iscsi layer to do it's thing. > > Care to explain how iscsit_close_connection might be called twice? > It appears iscsit_close_connection() is getting invoked first from iscsi_trx context after isert_cq_comp_err() has previously called iscsit_cause_connection_reinstatement() to force a connection failure to occur during explicit logout + ISCSI_LOGOUT_REASON_CLOSE_SESSION operation. You can tell because isert_wait_conn() + isert_wait4cmds() debug output appears before list_del corruption in iscsi_ttx context, which can only be invoked via iscsit_close_connection() -> transport->wait_for_conn() -> isert_wait_conn(). Once iscsi_ttx context runs, it's hitting the TARG_CONN_STATE_IN_LOGOUT state check in iscsit_take_action_for_connection_exit() and re-invokes iscsit_close_connection(), after iscsit_logout_closesession() from isert_rx_completion() context handles REASON_CLOSE_SESSION and changed connection state to IN_LOGOUT, but before the logout response was posted and successfully completed in isert_do_control_comp(). --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html