On Wed, Jan 11, 2023 at 09:08:28PM -0600, Mike Christie wrote: > > If we get a disconnect event while logging in we can end up in a state > where will never be able to relogin. This happens when: > > 1. login thread has put us into TARG_CONN_STATE_IN_LOGIN > 2. isert then does > > isert_disconnected_handler -> iscsit_cause_connection_reinstatement > > which sets the conn connection_reinstatement flag. Nothing else happens > because we are only in IN_LOGIN. The tx/rx threads are not running yet > so we can't start recovery from those contexts at this time. > > 3. The login thread finishes processing the login pdu and thinks login is > done. It sets us into TARG_CONN_STATE_LOGGED_IN/TARG_SESS_STATE_LOGGED_IN. > This starts the rx/tx threads. > > 4. The initiator thought it disconnected the connection at 2, and has > since sent a new connect which is now handled. This leads us to eventually > run: > > iscsi_check_for_session_reinstatement-> iscsit_stop_session -> > iscsit_cause_connection_reinstatement > > iscsit_stop_session sees the old conn and does > iscsit_cause_connection_reinstatement which sees connection_reinstatement > is set so it just returns instead of trying to kill the tx/rx threads > which would have caused recovery to start. > > 5. iscsit_stop_session then waits on session_wait_comp which will never > happen since we didn't kill the tx/rx threads. > > This has the iscsit login code check if a fabric driver ran > iscsit_cause_connection_reinstatement during the login process similar > to how we check for the sk state for tcp based iscsit. This will prevent > us from getting into 3 and creating a ghost session. > > Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> > --- > drivers/target/iscsi/iscsi_target_nego.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c > index ff49c8f3fe24..2dd81752d4c9 100644 > --- a/drivers/target/iscsi/iscsi_target_nego.c > +++ b/drivers/target/iscsi/iscsi_target_nego.c > @@ -350,6 +350,16 @@ static int iscsi_target_do_tx_login_io(struct iscsit_conn *conn, struct iscsi_lo > ISCSI_LOGIN_STATUS_NO_RESOURCES); > return -1; > } > + > + /* > + * isert doesn't know the iscsit state and uses > + * iscsit_cause_connection_reinstatement as a generic error > + * notification system. It may call it before we are in FFP. > + * Handle this now in case it signaled a failure before the > + * rx/tx threads were up and could start recovery. > + */ > + if (atomic_read(&conn->connection_reinstatement)) > + goto err; Why only for login->login_complete case? In other case the session will not hang? Will it be droppped on login timeout or something else? May be the root cause is point 2 itself - calling iscsit_cause_connection_reinstatement in not ISER_CONN_FULL_FEATURE state where there are no TX_RX threads? I mean that was a misuse of iscsit_cause_connection_reinstatement? > } > > if (conn->conn_transport->iscsit_put_login_tx(conn, login, > -- > 2.31.1 > >