Re: iSCSI Abort Task and WRITE PENDING

Mike Christie <michael.christie@xxxxxxxxxx> · Wed, 13 Oct 2021 12:51:32 -0500

On 10/13/21 8:21 AM, Konstantin Shelekhin wrote:
> Hi,
> 
> I really need the collective wisdom.
> 
> Not long ago we've uncovered the problem with iSCSI and ABORT TASK
> handling. Currently it's not possible to abort a WRITE_10 command in
> TRANSPORT_WRITE_PENDING state, because ABORT TASK  will hang itself in
> the process:
> 
>   # dmesg | tail -2
>   [   83.563505] ABORT_TASK: Found referenced iSCSI task_tag: 3372979269
>   [   84.593545] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus <nexus>
> 
>   # ps aux | awk '$8 ~/D/'
>   root        32  0.0  0.0      0     0 ?        D    15:19   0:00 [kworker/0:1+events]
>   root      1187  0.0  0.0      0     0 ?        D    15:20   0:00 [iscsi_ttx]
> 
>   # cat /proc/32/stack
>   [<0>] target_put_cmd_and_wait+0x68/0xa0
>   [<0>] core_tmr_abort_task.cold+0x16b/0x192
>   [<0>] target_tmr_work+0x9e/0xe0
>   [<0>] process_one_work+0x1d4/0x370
>   [<0>] worker_thread+0x48/0x3d0
>   [<0>] kthread+0x122/0x140
>   [<0>] ret_from_fork+0x22/0x30
> 
>   # cat /proc/1187/stack
>   [<0>] __transport_wait_for_tasks+0xaf/0x100
>   [<0>] transport_generic_free_cmd+0xe9/0x180
>   [<0>] iscsit_free_cmd+0x50/0xb0
>   [<0>] iscsit_close_connection+0x47d/0x8c0
>   [<0>] iscsit_take_action_for_connection_exit+0x6f/0xf0
>   [<0>] iscsi_target_tx_thread+0x184/0x200
>   [<0>] kthread+0x122/0x140
>   [<0>] ret_from_fork+0x22/0x30
> 
> What happens:
> 
>   1. Initiator sends WRITE_10 CDB
>   2. Target parses the CDB and sends R2T
>   3. Target starts the Data-Out timer
>   4. Initiator sends ABORT TASK; no new data from Initiator after this
>   5. Target starts aborting WRITE_10, gets into core_tmr_abort_task()
>      and starts waiting for the request completion
>   6. Nothing happens
>   7. The Data-Out timers expires, connection teardown starts and gets
>      stuck waiting for ABORT TASK that waits for WRITE_10
> 
> The ABORT TASK processing looks roughly like this:
> 
>   iscsi_rx_opcode
>     iscsi_handle_task_mgt_cmd
>       iscsi_tmr_abort_task
>       transport_generic_handle_tmr
>         if (tmr_cmd->transport_state & CMD_T_ABORTED)
>           target_handle_abort
>         else
>           target_tmr_work
>             if (tmr_cmd->transport_state & CMD_T_ABORTED)
>               target_handle_abort
>             else
>               core_tmr_abort_task
>                 ret = __target_check_io_state
>                   if (write_cmd->transport_state & CMD_T_STOP)
>                     return -1
>                   write_cmd->transport_state |= CMD_T_ABORTED
>                   return 0
>                 if (!ret)
>                   list_move_tail(&write_cmd->state_list, &aborted)
>                   target_put_cmd_and_wait(&write_cmd)
> 
> As I see it, the main problem is that the abort path can't initiate the
> command termination, it simply waits for the request to handle this on
> the execution path like in target_execute_cmd():
> 
>   target_execute_cmd
>     target_cmd_interrupted
>       INIT_WORK(&cmd->work, target_abort_work)
> 
> However, in this case the request is not going to be executed because
> Initiator will not send the Data-Out buffer.
> 
> I have a couple of ideas on how to fix this, but they all look kinda
> ugly. The one that currently works around this for me:
> 
>   core_tmr_abort_task():
> 
>     [...]
> 
>     spin_lock_irqsave(&se_cmd->t_state_lock, flags);
>     write_pending = se_cmd->t_state == TRANSPORT_WRITE_PENDING;
>     spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
>     
>     if (write_pending && se_cmd->se_tfo->abort_write_pending)
>             se_cmd->se_tfo->abort_write_pending(se_cmd);
>     
>     target_put_cmd_and_wait(se_cmd);
> 
>     [...]
> 
> The new method abort_write_pending() is defined only for iSCSI and calls
> target_handle_abort(). However, this opens up another can of worms
> because this code heavily races with R2T sending and requires a couple
> of checks to "work most of the time". Not ideal, by far.
> 
> I can make this one better by introducing R2T list draining that ensures
> the proper order during cleanup, but maybe there is a much easier way
> that I'm not seeing here.

Ccing Maurizio to make sure I don't add his original bug back.

If I understand you, I think I added this bug in:

commit f36199355c64a39fe82cfddc7623d827c7e050da
Author: Mike Christie <michael.christie@xxxxxxxxxx>
Date:   Fri Nov 13 19:46:18 2020 -0600

    scsi: target: iscsi: Fix cmd abort fabric stop race

With that patch if the abort or a lun reset has got to lio core then we
are going to be stuck waiting for the data which won't come because we
killed the iscsi threads.

Can go back to always having the iscsi target clean up the cmd, but if
LIO has started to abort the cmd we take an extra ref so we don't free
the cmd from under each other.

This patch is completely untested:

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 2c54c5d8412d..d221e9be7468 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4090,12 +4090,13 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
 			spin_lock_irq(&se_cmd->t_state_lock);
 			if (se_cmd->transport_state & CMD_T_ABORTED) {
 				/*
-				 * LIO's abort path owns the cleanup for this,
-				 * so put it back on the list and let
-				 * aborted_task handle it.
+				 * The LIO TMR handler owns the cmd but if
+				 * we were waiting for data from the initiator
+				 * then we need to internally cleanup to be
+				 * able to complete it. Get an extra ref so
+				 * we don't free the cmd from under LIO core.
 				 */
-				list_move_tail(&cmd->i_conn_node,
-					       &conn->conn_cmd_list);
+				target_get_sess_cmd(se_cmd, false);
 			} else {
 				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
 			}