Re: [PATCH 1/3] target: iscsi: fix hang in the iSCSI login code

Mike Christie <michael.christie@xxxxxxxxxx> · Fri, 10 Mar 2023 13:53:12 -0600

On 3/10/23 4:04 AM, Maurizio Lombardi wrote:
> If the initiator suddenly stops sending data during a login while
> keeping the TCP connection open, the sk_data_ready callback won't
> schedule the login_work and the latter
> will never timeout and release the login semaphore.
> 
> All the other login operations will therefore get stuck waiting
> for the semaphore to be released.
> 

You mean np_login_sem right? Do you know why we have to serialize access
to the np during login? Is it just a simple way to handle the internal
target variables for things like MC/s, reinstatement, etc? I saw the tsih
case, but am wondering how easy it is to remove.

If we need the sem why do we use the sk_callback/non-blocking type of
approach when we can only do 1 login at a time per tpg? We end up
creating 2 threads per connection after FFP, so it seems like we would
have enough resources for a temp login thread before FFP. Looking over
the commit history, we seem to hit a good number of bugs in this code.

It seems like we could simplify the login code by do a blocking type of
implementation where:

1. __iscsi_target_login_thread starts a workstruct that runs
iscsi_target_start_negotiation.  It would then run iscsi_target_do_login_rx
which just waits for a response. When we get one, it does iscsi_target_do_login
and if we need more PDUs loops. We have one timer for all this.

2. We can remove the np_login_sem. It would be replaced by a workstruct
in the np. __iscsi_target_login_thread would just flush the work to make sure
we are not running a login on the same np already.

3. We can remove all the sk_callback* related code for iscsit tcp since
recvmsg just return failure when the state changes.

4. It looks like cxgbit will work with some small changes because I think it
just kicks off iscsi_target_do_login_rx after sending a login PDU, then it
just waits in its iscsit_get_login_rx.

5. It looks like isert will also work, because it's isert_get_login_rx can
just wait on the login_req_comp/login_comp already.