Re: iSCSI Abort Task and WRITE PENDING

Konstantin Shelekhin <k.shelekhin@xxxxxxxxx> · Mon, 18 Oct 2021 14:56:44 +0300

On Thu, Oct 14, 2021 at 10:18:13PM -0500, michael.christie@xxxxxxxxxx wrote:
> > If I understand this aproach correctly, it fixes the deadlock, but the
> > connection reinstatement will still happen, because WRITE_10 won't be
> > aborted and the connection will go down after the timeout.> 
> > IMO it's not ideal either, since now iSCSI will have a 50% chance to
> > have the connection (meaning SCSI session) killed on arbitrary ABOR
> 
> I wouldn't call this an arbitrary abort. It's indicating a problem.
> When do you see this? Why do we need to fix it per cmd? Are you hitting
> the big command short timeout issue? Driver/fw bug?

It was triggered by ESXi. During some heavy IOPS intervals the backend
device cannot handle the load and some IOs get stuck for more than 30
seconds. I suspect that ABORT TASKSs are issued by the virtual machines.
So a series of ABORT TASK will come, and the unlucky one will hit the
issue.

> > TASK. While I'm sure most initiators will be able to recover from this
> > event, such drastic measures will certanly cause a lot of confusion for
> > people who are not familiar with TCM internals
> How will this cause confusion vs the case where the cmd reaches the target
> and we are waiting for it on the backend? In both cases, the initiator sends
> an abort, it times out, the initiator or target drop the connection, we
> relogin. Every initiator handles this.

Because usually (when a WRITE request is past the WRITE PENDING state)
the ABORT TASK does not trigger relogin. In my experience the initiator
just waits for the TMR completion and goes on.

And from a blackbox perspective it looks suspicious:

  1. ABORT TASK sent to WRITE_10 tag 0x1; waits for it's completion
  2. ABORT TASK sent to WRITE_10 tag 0x2; almost immediately the connection is dropped

The only difference between #1 and #2 is that the command 0x1 is past
the WRITE PENDING state.

> With that said I am in favor of you fixing the code so we can cleanup
> a partially sent cmd if it can be done sanely.
> 
> I personally would just leave the current behavior and fix the deadlock
> because:
> 
> 1. When I see this happening it's normally the network so we have to blow
> away the group of commands and we end up dropping the connection one way
> or another. I don't see the big command short timeout case often anymore.
> 
> 2. Initiators just did not implement this right. I know this for sure
> for open-iscsi at least. I started to fix my screw ups the other day but it
> ends up breaking the targets.
> 
> For example,
> 
> - If we've sent a R2T and the initiator has sent a LUN RESET, what are
> you going to have the target do? Send the response right away?

AFAIR the spec says "nuke it, there will be no data after this".

> - If we've sent a R2T and the initiator has sent some of the data
> PDUs to full fill it but has not sent the final PDU, then it sends the
> LUN RESET, what do you do?

The same. However, I understand the interoperability concerns. I'll
check what other targets do.

> - You also have the immediate data case and the InitialR2T case.

True.

> The updated specs clarify how to handle this, and even have a FastAbort
> key to specify which behavior we are going to do. But we don't support
> it and I don't think many people implemented it.
> 
> So you are going to get a mix of behavior. Some initiators will send the
> RESET and still send the data out PDUs and some will just stop sending
> data outs after the RESET. To be safe do you wait for the initiator to
> complete the sequence of data out PDUs? If so then you probably just hit
> the same issue where we don't get the needed PDUs and the one side drops
> the connection. If we send the ABORT response while the initiator is
> still sending data outs, then we risk breaking them.
> 
> If you want to do it then go for it, but to answer you original email's
> question the only easy way out is to just let it time out :)

Sounds reasonable. I'll test your solution.