Re: [PATCH v2 09/13] scsi: target: iscsit: Fix isert disconnect handling during login

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/23/23 3:53 AM, Sagi Grimberg wrote:
> 
> 
> On 1/18/23 04:14, Mike Christie wrote:
>> On 1/13/23 08:08, Dmitry Bogdanov wrote:
>>> On Wed, Jan 11, 2023 at 09:08:28PM -0600, Mike Christie wrote:
>>>>
>>>> If we get a disconnect event while logging in we can end up in a state
>>>> where will never be able to relogin. This happens when:
>>>>
>>>> 1. login thread has put us into TARG_CONN_STATE_IN_LOGIN
>>>> 2. isert then does
>>>>
>>>> isert_disconnected_handler -> iscsit_cause_connection_reinstatement
>>>>
>>>> which sets the conn connection_reinstatement flag. Nothing else happens
>>>> because we are only in IN_LOGIN. The tx/rx threads are not running yet
>>>> so we can't start recovery from those contexts at this time.
>>>>
>>>> 3. The login thread finishes processing the login pdu and thinks login is
>>>> done. It sets us into TARG_CONN_STATE_LOGGED_IN/TARG_SESS_STATE_LOGGED_IN.
>>>> This starts the rx/tx threads.
>>>>
>>>> 4. The initiator thought it disconnected the connection at 2, and has
>>>> since sent a new connect which is now handled. This leads us to eventually
>>>> run:
>>>>
>>>> iscsi_check_for_session_reinstatement-> iscsit_stop_session ->
>>>> iscsit_cause_connection_reinstatement
>>>>
>>>> iscsit_stop_session sees the old conn and does
>>>> iscsit_cause_connection_reinstatement which sees connection_reinstatement
>>>> is set so it just returns instead of trying to kill the tx/rx threads
>>>> which would have caused recovery to start.
>>>>
>>>> 5. iscsit_stop_session then waits on session_wait_comp which will never
>>>> happen since we didn't kill the tx/rx threads.
>>>>
>>>> This has the iscsit login code check if a fabric driver ran
>>>> iscsit_cause_connection_reinstatement during the login process similar
>>>> to how we check for the sk state for tcp based iscsit. This will prevent
>>>> us from getting into 3 and creating a ghost session.
>>>>
>>>> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
>>>> ---
>>>>   drivers/target/iscsi/iscsi_target_nego.c | 10 ++++++++++
>>>>   1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
>>>> index ff49c8f3fe24..2dd81752d4c9 100644
>>>> --- a/drivers/target/iscsi/iscsi_target_nego.c
>>>> +++ b/drivers/target/iscsi/iscsi_target_nego.c
>>>> @@ -350,6 +350,16 @@ static int iscsi_target_do_tx_login_io(struct iscsit_conn *conn, struct iscsi_lo
>>>>                                              ISCSI_LOGIN_STATUS_NO_RESOURCES);
>>>>                          return -1;
>>>>                  }
>>>> +
>>>> +               /*
>>>> +                * isert doesn't know the iscsit state and uses
>>>> +                * iscsit_cause_connection_reinstatement as a generic error
>>>> +                * notification system. It may call it before we are in FFP.
>>>> +                * Handle this now in case it signaled a failure before the
>>>> +                * rx/tx threads were up and could start recovery.
>>>> +                */
>>>> +               if (atomic_read(&conn->connection_reinstatement))
>>>> +                       goto err;
>>>
>>> Why only for login->login_complete case? In other case the session will
>>> not hang? Will it be droppped on login timeout or something else?
>>
>> It will not hang. If you hit an error with isert before we think we can go into
>> FFP then the login timeout currently cleans up the conn and session.
>>
>>>
>>> May be the root cause is point 2 itself - calling iscsit_cause_connection_reinstatement
>>> in not ISER_CONN_FULL_FEATURE state where there are no TX_RX threads?
>>> I mean that was a misuse of iscsit_cause_connection_reinstatement?
>>
>> Let me drop this patch for now. After writing the response above about normally it just
>> times out, and thinking about your question here I think to really fix this we want to
>> fully integrate isert login into iscsit.
>>
>> The root problem is that isert login is not integrated into iscsit, so there is really
>> no error handling. iscsit_cause_connection_reinstatement was supposed to do it, but it
>> doesn't do anything.
>>
>> So we need to separate the LOGIN_FLAGS_CLOSED tests and setting from the iscsi_target_sk*
>> code. Add a helper to set that bit and do some state checks, and make the checks generic
>> in the login code (not tied to having a socket). Convert iscsit and isert to use the new
>> helper. Then handle the LOGIN_FLAGS_READ_ACTIVE/LOGIN_FLAGS_WRITE_ACTIVE and login_work
>> stuff. Then review cxgb?
>>
>> I'll do that later after fixing the command cleanup stuff in this patchsdet and the
>> other patchsets I have outstanding.
> 
> Perhaps the fix is as simple as checking the iscsi state to be also full-featured and if not, do a simple error handling within
> iser_disconnected_handler?

Yeah that's what I'm thinking above. The catch was what is the simple error handler going to be.

For iscsit tcp it mostly just sets LOGIN_FLAGS_CLOSED to signal the failure. So I was saying
above if we could just expose that to isert, then all it has to do is call some helper iscsit
helper:

isert_disconnected_handler
...
        case ISER_CONN_BOUND:
		iscsit_login_failed();
		break;
	case ISER_CONN_FULL_FEATURE
		iscsit_cause_connection_reinstatement(isert_conn->conn, 0);

*I think isert_login_send_done would also need a change.

The iscsit code then would work similarly for both isert and iscsit tcp. There's then just
cxgb which I think just times out like isert.

The other option is for the simple case is just let it timeout like cxgb. And for isert
just do nothing:

        case ISER_CONN_BOUND:
		isert_err("Failure on unbound connection. Timing out login\n");
		break;
	case ISER_CONN_FULL_FEATURE
		iscsit_cause_connection_reinstatement(isert_conn->conn, 0);
		break;



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux