Re: iSCSI Abort Task and WRITE PENDING

michael.christie@xxxxxxxxxx · Thu, 14 Oct 2021 22:18:13 -0500

On 10/14/21 6:12 PM, Konstantin Shelekhin wrote:
> On Wed, Oct 13, 2021 at 03:21:19PM -0500, Mike Christie wrote:
>> For lun resets, let's take the abort case.
>>
>> 1. core_tmr_abort_task is waiting in target_put_cmd_and_wait for the WRITE.
>> 2. For the normal case where a cmd has been passed to LIO core then eventually
>> the cmd completes and we do:
>>
>> target_complete_cmd -> target_complete_cmd_with_sense -> target_cmd_interrupted
>> -> target_handle_abort. Ignoring TAS, that function then does the last puts on the
>> cmd which wakes up the TMR code in #1.
>>
>> For the case where we never submit the cmd to LIO core, then the puts will never
>> be done by that target_complete_cmd path. And, in this case where the iscsi connection
>> is closed we know the cmd will never be submitted to LIO core because the iscsi
>> threads have been killed. iscsi needs to do the last puts on the cmd so #1 wakes up
>> on that WRITE.
>>
>> So we need a cmd bit to set to indicate iscsi has called one of the submission
>> /execution functions in LIO. If it's not set then iscsit_release_commands_from_conn/
>> iscsit_free_cmd needs to do those last puts.
>>
>> Here in this example to better show what I mean, we detect if an abort has been sent
>> to LIO core for a cmd that has not been sent to LIO core. If that happens then
>> iscsit_free_cmd gets force_cleanup=true so transport_generic_free_cmd will do a put
>> and not wait, and then iscsit_free_cmd does the last target_put_sess_cmd which
>> will wake #1 above. The abort will then complete too.
>>
>> Note:
>> I don't think we have to worry about TAS in this case, because we never got a complete
>> SCSI command. The iscsi layer only got part of it and never submitted it to the SCSI
>> layer. We then escalated to session level recovery so we are not sending any responses
>> for any of the cmds or TMFs. The transport is killed at this point, so no responses
>> can even be sent.
>>
>>
>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
>> index 2c54c5d8412d..d0e80a2b653b 100644
>> --- a/drivers/target/iscsi/iscsi_target.c
>> +++ b/drivers/target/iscsi/iscsi_target.c
>> @@ -4088,7 +4088,8 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>>  
>>  		if (se_cmd->se_tfo != NULL) {
>>  			spin_lock_irq(&se_cmd->t_state_lock);
>> -			if (se_cmd->transport_state & CMD_T_ABORTED) {
>> +			if (se_cmd->transport_state & CMD_T_ABORTED &&
>> +			    se_cmd->transport_state & CMD_T_SUBMITTED) {
>>  				/*
>>  				 * LIO's abort path owns the cleanup for this,
>>  				 * so put it back on the list and let
>> @@ -4096,7 +4097,7 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>>  				 */
>>  				list_move_tail(&cmd->i_conn_node,
>>  					       &conn->conn_cmd_list);
>> -			} else {
>> +			} else if (!(se_cmd->transport_state & CMD_T_ABORTED)) {
>>  				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
>>  			}
>>  			spin_unlock_irq(&se_cmd->t_state_lock);
>> @@ -4108,8 +4109,12 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>>  		list_del_init(&cmd->i_conn_node);
>>  
>>  		iscsit_increment_maxcmdsn(cmd, sess);
>> -		iscsit_free_cmd(cmd, true);
>>  
>> +		if (se_cmd->transport_state & CMD_T_ABORTED &&
>> +		    !(se_cmd->transport_state & CMD_T_SUBMITTED))
>> +			iscsit_free_cmd(cmd, false, true);
>> +		else
>> +			iscsit_free_cmd(cmd, true, false);
>>  	}
>>  }
>>  
>> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
>> index 6dd5810e2af1..931586595044 100644
>> --- a/drivers/target/iscsi/iscsi_target_util.c
>> +++ b/drivers/target/iscsi/iscsi_target_util.c
>> @@ -742,7 +742,7 @@ void __iscsit_free_cmd(struct iscsi_cmd *cmd, bool check_queues)
>>  		conn->conn_transport->iscsit_unmap_cmd(conn, cmd);
>>  }
>>  
>> -void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown)
>> +void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown, bool force_cleanup)
>>  {
>>  	struct se_cmd *se_cmd = cmd->se_cmd.se_tfo ? &cmd->se_cmd : NULL;
>>  	int rc;
>> @@ -751,10 +751,14 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown)
>>  
>>  	__iscsit_free_cmd(cmd, shutdown);
>>  	if (se_cmd) {
>> -		rc = transport_generic_free_cmd(se_cmd, shutdown);
>> +		rc = transport_generic_free_cmd(se_cmd,
>> +					force_cleanup ? false : shutdown);
>>  		if (!rc && shutdown && se_cmd->se_sess) {
>>  			__iscsit_free_cmd(cmd, shutdown);
>>  			target_put_sess_cmd(se_cmd);
>> +		} else if (se_cmd->sess && force_cleanup) {
>> +			__iscsit_free_cmd(cmd, true);
>> +			target_put_sess_cmd(se_cmd);
>>  		}
>>  	} else {
>>  		iscsit_release_cmd(cmd);
>> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
>> index 14c6f2bb1b01..eb233ea8db65 100644
>> --- a/drivers/target/target_core_transport.c
>> +++ b/drivers/target/target_core_transport.c
>> @@ -1554,7 +1554,7 @@ int transport_handle_cdb_direct(
>>  	 * this to be called for initial descriptor submission.
>>  	 */
>>  	cmd->t_state = TRANSPORT_NEW_CMD;
>> -	cmd->transport_state |= CMD_T_ACTIVE;
>> +	cmd->transport_state |= (CMD_T_ACTIVE | CMD_T_SUBMITTED);
>>  
>>  	/*
>>  	 * transport_generic_new_cmd() is already handling QUEUE_FULL,
>> @@ -2221,7 +2221,7 @@ void target_execute_cmd(struct se_cmd *cmd)
>>  		return;
>>  
>>  	spin_lock_irq(&cmd->t_state_lock);
>> -	cmd->t_state = TRANSPORT_PROCESSING;
>> +	cmd->t_state = (TRANSPORT_PROCESSING | CMD_T_SUBMITTED);
>>  	cmd->transport_state |= CMD_T_ACTIVE | CMD_T_SENT;
>>  	spin_unlock_irq(&cmd->t_state_lock);
>>  
>> diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
>> index fb11c7693b25..b759ec810fa9 100644
>> --- a/include/target/target_core_base.h
>> +++ b/include/target/target_core_base.h
>> @@ -511,6 +511,7 @@ struct se_cmd {
>>  #define CMD_T_COMPLETE		(1 << 2)
>>  #define CMD_T_SENT		(1 << 4)
>>  #define CMD_T_STOP		(1 << 5)
>> +#define CMD_T_SUBMITTED		(1 << 6)
>>  #define CMD_T_TAS		(1 << 10)
>>  #define CMD_T_FABRIC_STOP	(1 << 11)
>>  	spinlock_t		t_state_lock;
>>
>>
>>
>>
>>
> 
> If I understand this aproach correctly, it fixes the deadlock, but the
> connection reinstatement will still happen, because WRITE_10 won't be
> aborted and the connection will go down after the timeout.> 
> IMO it's not ideal either, since now iSCSI will have a 50% chance to
> have the connection (meaning SCSI session) killed on arbitrary ABOR

I wouldn't call this an arbitrary abort. It's indicating a problem.
When do you see this? Why do we need to fix it per cmd? Are you hitting
the big command short timeout issue? Driver/fw bug?

> TASK. While I'm sure most initiators will be able to recover from this
> event, such drastic measures will certanly cause a lot of confusion for
> people who are not familiar with TCM internals
How will this cause confusion vs the case where the cmd reaches the target
and we are waiting for it on the backend? In both cases, the initiator sends
an abort, it times out, the initiator or target drop the connection, we
relogin. Every initiator handles this.

With that said I am in favor of you fixing the code so we can cleanup
a partially sent cmd if it can be done sanely.

I personally would just leave the current behavior and fix the deadlock
because:

1. When I see this happening it's normally the network so we have to blow
away the group of commands and we end up dropping the connection one way
or another. I don't see the big command short timeout case often anymore.

2. Initiators just did not implement this right. I know this for sure
for open-iscsi at least. I started to fix my screw ups the other day but it
ends up breaking the targets.

For example,

- If we've sent a R2T and the initiator has sent a LUN RESET, what are
you going to have the target do? Send the response right away?

- If we've sent a R2T and the initiator has sent some of the data
PDUs to full fill it but has not sent the final PDU, then it sends the
LUN RESET, what do you do?

- You also have the immediate data case and the InitialR2T case.

The updated specs clarify how to handle this, and even have a FastAbort
key to specify which behavior we are going to do. But we don't support
it and I don't think many people implemented it.

So you are going to get a mix of behavior. Some initiators will send the
RESET and still send the data out PDUs and some will just stop sending
data outs after the RESET. To be safe do you wait for the initiator to
complete the sequence of data out PDUs? If so then you probably just hit
the same issue where we don't get the needed PDUs and the one side drops
the connection. If we send the ABORT response while the initiator is
still sending data outs, then we risk breaking them.

If you want to do it then go for it, but to answer you original email's
question the only easy way out is to just let it time out :)