Re: [PATCH 1/3] target: iscsi: fix hang in the iSCSI login code

Maurizio Lombardi <mlombard@xxxxxxxxxx> · Tue, 14 Mar 2023 12:09:03 +0100

út 14. 3. 2023 v 0:52 odesílatel Mike Christie
<michael.christie@xxxxxxxxxx> napsal:
>
> > +     case TCP_CLOSE:
> > +             pr_debug("__iscsi_target_sk_check_close: socket closing,"
> >                       "returning TRUE\n");
>
> Don't need to break up a string. We do it a lot in the lio code, but we've
> been trying not to in new code.
>
> > +             /*
> > +              * Restart the login timer to prevent the
> > +              * login process from getting stuck if the initiator
>
> I would fix up the formatting so the first line is longer.

Ok

> > @@ -1358,6 +1348,9 @@ int iscsi_target_start_negotiation(
> >               set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
> >               write_unlock_bh(&sk->sk_callback_lock);
> >       }
> > +
> > +     iscsit_start_login_timer(conn);
>
> At this time, we have the np->np_login_timer running right?

Yes.

>
> Don't we only need to start this new timer when we know there are
> more PDUs and the connection is good (iscsi_target_do_login returns
> 0 and iscsi_target_sk_check_and_clear returns 0)?

The moment iscsi_target_sk_check_and_clear() clears the
LOGIN_FLAGS_INITIAL_PDU flag
and returns 0, the login worker may be already running.
If we start the timer after the call to
iscsi_target_sk_check_and_clear(), we could have a race condition:

1) login_work runs and reschedules itself non-stop because
LOGIN_FLAGS_INITIAL_PDU is set
2) login kthread calls  iscsi_target_sk_check_and_clear() and clears
LOGIN_FLAGS_INITIAL_PDU
3) login work runs and completes the login
4) login kthread starts the timer
5) No one stops the timer, it fires and kills the connection despite
the fact the login was successful.

I could however replace this code:

ret = iscsi_target_do_login(conn, login);
 if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
           ret = -1;

with the following, if you like it more:

ret = iscsi_target_do_login(conn, login);
if (!ret) {
      iscsit_start_login_timer(conn);
      if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU)) {
           iscsit_stop_login_timer(conn);
           ret = -1;
      }
}

>
> I think you can just kill np timer and only use the login_timer for
> both cases. So I mean set the thread to kill as the login one and start
> this login_timer in __iscsi_target_login_thread where we used to call
> iscsi_start_login_thread_timer. You would then mod the timer when we
> transition from iscsi_target_start_negotiation to waiting for the next
> PDU.
>

Yes, maybe, but I would need to find a way to detect if conn->login_kworker
is pointing to the login thread or to the login_work's thread, because
the np_login_timer is supposed to clear the ISCSI_TF_RUNNING flag.

maybe something like this:

if (conn->login_kworker == conn->tpg_np->tpg_np->np_thread) {
     spin_lock_bh(&np->np_thread_lock);
     if (!(np->np_login_timer_flags & ISCSI_TF_STOP))
           np->np_login_timer_flags &= ~ISCSI_TF_RUNNING;
     spin_unlock_bh(&np->np_thread_lock);
}

> For isert and cxgbit we won't have conn->sock set so I think you need some
> sort of callout for those drivers, or maybe set LOGIN_FLAGS_CLOSED and queue
> the login_work. Maybe the latter will work for all drivers as well. You probably
> need some extra locking and LOGIN_FLAGS checks to handle an issue similar to
> below.

Hmm, that would need to be tested, because LOGIN_FLAGS_CLOSED is supposed
to be set when the socket is already in the process of getting closed
(it's state is TCP_CLOSE_WAIT or TCP_FIN_WAIT* or whatever)
So If I set LOGIN_FLAGS_CLOSED in the timer and the socket is
TCP_ESTABLISHED it means that I am trying to
do the opposite, will the socket be properly closed
by isert/cxgbit in this case?

>
> If we do need to do the kernel_sock_shutdown we might need to add some code
> to not call it twice (I'm not sure it's fully supported). It looks like we
> can race and do:

Correct, thanks. In that case a check like this should be sufficient.

if (!test_bit( LOGIN_FLAGS_CLOSED))
     kernel_sock_shutdown(sock, SHUT_RDWR);

Maurizio