On Wed, Oct 13, 2021 at 09:58:32PM +0300, Konstantin Shelekhin wrote: > On Wed, Oct 13, 2021 at 01:30:32PM -0500, Mike Christie wrote: > > On 10/13/21 1:24 PM, Mike Christie wrote: > > > On 10/13/21 1:08 PM, Konstantin Shelekhin wrote: > > >> On Wed, Oct 13, 2021 at 12:51:32PM -0500, Mike Christie wrote: > > >>> On 10/13/21 8:21 AM, Konstantin Shelekhin wrote: > > >>>> Hi, > > >>>> > > >>>> I really need the collective wisdom. > > >>>> > > >>>> Not long ago we've uncovered the problem with iSCSI and ABORT TASK > > >>>> handling. Currently it's not possible to abort a WRITE_10 command in > > >>>> TRANSPORT_WRITE_PENDING state, because ABORT TASK will hang itself in > > >>>> the process: > > >>>> > > >>>> # dmesg | tail -2 > > >>>> [ 83.563505] ABORT_TASK: Found referenced iSCSI task_tag: 3372979269 > > >>>> [ 84.593545] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus <nexus> > > >>>> > > >>>> # ps aux | awk '$8 ~/D/' > > >>>> root 32 0.0 0.0 0 0 ? D 15:19 0:00 [kworker/0:1+events] > > >>>> root 1187 0.0 0.0 0 0 ? D 15:20 0:00 [iscsi_ttx] > > >>>> > > >>>> # cat /proc/32/stack > > >>>> [<0>] target_put_cmd_and_wait+0x68/0xa0 > > >>>> [<0>] core_tmr_abort_task.cold+0x16b/0x192 > > >>>> [<0>] target_tmr_work+0x9e/0xe0 > > >>>> [<0>] process_one_work+0x1d4/0x370 > > >>>> [<0>] worker_thread+0x48/0x3d0 > > >>>> [<0>] kthread+0x122/0x140 > > >>>> [<0>] ret_from_fork+0x22/0x30 > > >>>> > > >>>> # cat /proc/1187/stack > > >>>> [<0>] __transport_wait_for_tasks+0xaf/0x100 > > >>>> [<0>] transport_generic_free_cmd+0xe9/0x180 > > >>>> [<0>] iscsit_free_cmd+0x50/0xb0 > > >>>> [<0>] iscsit_close_connection+0x47d/0x8c0 > > >>>> [<0>] iscsit_take_action_for_connection_exit+0x6f/0xf0 > > >>>> [<0>] iscsi_target_tx_thread+0x184/0x200 > > >>>> [<0>] kthread+0x122/0x140 > > >>>> [<0>] ret_from_fork+0x22/0x30 > > >>>> > > >>>> What happens: > > >>>> > > >>>> 1. Initiator sends WRITE_10 CDB > > >>>> 2. Target parses the CDB and sends R2T > > >>>> 3. Target starts the Data-Out timer > > >>>> 4. Initiator sends ABORT TASK; no new data from Initiator after this > > >>>> 5. Target starts aborting WRITE_10, gets into core_tmr_abort_task() > > >>>> and starts waiting for the request completion > > >>>> 6. Nothing happens > > >>>> 7. The Data-Out timers expires, connection teardown starts and gets > > >>>> stuck waiting for ABORT TASK that waits for WRITE_10 > > >>>> > > >>>> The ABORT TASK processing looks roughly like this: > > >>>> > > >>>> iscsi_rx_opcode > > >>>> iscsi_handle_task_mgt_cmd > > >>>> iscsi_tmr_abort_task > > >>>> transport_generic_handle_tmr > > >>>> if (tmr_cmd->transport_state & CMD_T_ABORTED) > > >>>> target_handle_abort > > >>>> else > > >>>> target_tmr_work > > >>>> if (tmr_cmd->transport_state & CMD_T_ABORTED) > > >>>> target_handle_abort > > >>>> else > > >>>> core_tmr_abort_task > > >>>> ret = __target_check_io_state > > >>>> if (write_cmd->transport_state & CMD_T_STOP) > > >>>> return -1 > > >>>> write_cmd->transport_state |= CMD_T_ABORTED > > >>>> return 0 > > >>>> if (!ret) > > >>>> list_move_tail(&write_cmd->state_list, &aborted) > > >>>> target_put_cmd_and_wait(&write_cmd) > > >>>> > > >>>> As I see it, the main problem is that the abort path can't initiate the > > >>>> command termination, it simply waits for the request to handle this on > > >>>> the execution path like in target_execute_cmd(): > > >>>> > > >>>> target_execute_cmd > > >>>> target_cmd_interrupted > > >>>> INIT_WORK(&cmd->work, target_abort_work) > > >>>> > > >>>> However, in this case the request is not going to be executed because > > >>>> Initiator will not send the Data-Out buffer. > > >>>> > > >>>> I have a couple of ideas on how to fix this, but they all look kinda > > >>>> ugly. The one that currently works around this for me: > > >>>> > > >>>> core_tmr_abort_task(): > > >>>> > > >>>> [...] > > >>>> > > >>>> spin_lock_irqsave(&se_cmd->t_state_lock, flags); > > >>>> write_pending = se_cmd->t_state == TRANSPORT_WRITE_PENDING; > > >>>> spin_unlock_irqrestore(&se_cmd->t_state_lock, flags); > > >>>> > > >>>> if (write_pending && se_cmd->se_tfo->abort_write_pending) > > >>>> se_cmd->se_tfo->abort_write_pending(se_cmd); > > >>>> > > >>>> target_put_cmd_and_wait(se_cmd); > > >>>> > > >>>> [...] > > >>>> > > >>>> The new method abort_write_pending() is defined only for iSCSI and calls > > >>>> target_handle_abort(). However, this opens up another can of worms > > >>>> because this code heavily races with R2T sending and requires a couple > > >>>> of checks to "work most of the time". Not ideal, by far. > > >>>> > > >>>> I can make this one better by introducing R2T list draining that ensures > > >>>> the proper order during cleanup, but maybe there is a much easier way > > >>>> that I'm not seeing here. > > >>> > > >>> Ccing Maurizio to make sure I don't add his original bug back. > > >>> > > >>> If I understand you, I think I added this bug in: > > >>> > > >>> commit f36199355c64a39fe82cfddc7623d827c7e050da > > >>> Author: Mike Christie <michael.christie@xxxxxxxxxx> > > >>> Date: Fri Nov 13 19:46:18 2020 -0600 > > >>> > > >>> scsi: target: iscsi: Fix cmd abort fabric stop race > > >>> > > >>> With that patch if the abort or a lun reset has got to lio core then we > > >>> are going to be stuck waiting for the data which won't come because we > > >>> killed the iscsi threads. > > >>> > > >>> Can go back to always having the iscsi target clean up the cmd, but if > > >>> LIO has started to abort the cmd we take an extra ref so we don't free > > >>> the cmd from under each other. > > >>> > > >>> This patch is completely untested: > > >>> > > >>> > > >>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c > > >>> index 2c54c5d8412d..d221e9be7468 100644 > > >>> --- a/drivers/target/iscsi/iscsi_target.c > > >>> +++ b/drivers/target/iscsi/iscsi_target.c > > >>> @@ -4090,12 +4090,13 @@ static void (struct iscsi_conn *conn) > > >>> spin_lock_irq(&se_cmd->t_state_lock); > > >>> if (se_cmd->transport_state & CMD_T_ABORTED) { > > >>> /* > > >>> - * LIO's abort path owns the cleanup for this, > > >>> - * so put it back on the list and let > > >>> - * aborted_task handle it. > > >>> + * The LIO TMR handler owns the cmd but if > > >>> + * we were waiting for data from the initiator > > >>> + * then we need to internally cleanup to be > > >>> + * able to complete it. Get an extra ref so > > >>> + * we don't free the cmd from under LIO core. > > >>> */ > > >>> - list_move_tail(&cmd->i_conn_node, > > >>> - &conn->conn_cmd_list); > > >>> + target_get_sess_cmd(se_cmd, false); > > >>> } else { > > >>> se_cmd->transport_state |= CMD_T_FABRIC_STOP; > > >>> } > > >> > > >> The bug was there before. I had to backport this patch in order to > > >> introduce my fix. I can revert it and check what is different, but it's > > >> there in some form. > > >> > > > > > > Don't waste your time. It's because iscsit_free_cmd's call to > > > transport_generic_free_cmd has wait_for_tasks=true. > > > > > > We then do transport_generic_free_cmd -> target_wait_free_cmd -> > > > __transport_wait_for_tasks like you posted above. > > > > That's wrong, it's the transport_generic_free_cmd wait: > > > > if (aborted) { > > pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag); > > wait_for_completion(&compl); > > ret = 1; > > I'm not sure I'm following you. The wait on master comes from this: > > core_tmr_abort_task(): > [...] > target_put_cmd_and_wait(se_cmd); > [...] > > IMO it's kinda legit wait. And we can't just drop the references and > call it a day, because a request has to go through the > target_handle_abort() because it (at least) does some TAS magic. And > currently there is no code that calls target_handle_abort() for > WRITE_PENDING requests. Ah, sorry, I should've mentioned it I guess. The transport_generic_free_cmd() bit is connection teardown waiting on ABORT TASK that waits for WRITE_10 task. That's part is completely fine IMO.