On Monday, November 6, 2023 11:42 AM, Karan Tilak Kumar (kartilak) wrote: > > On Monday, October 23, 2023 11:54 PM, Christoph Hellwig <hch@xxxxxx> wrote: > > > > Adding the fnic maintainers as they are probably most qualified to review and test this. > > > > On Mon, Oct 23, 2023 at 11:15:04AM +0200, Hannes Reinecke wrote: > > > Allocate a reset command on the fly instead of relying on using the > > > command which triggered the device failure. > > > This might fail if all available tags are busy, but in that case > > > it'll be safer to fall back to host reset anyway. > > > > > Thanks for this fix, Hannes. > I'm working on integrating these changes and testing them. > I'll get back to you about this. > I integrated your patch set using "b4 shazam" and built all the changes to do some dev testing. I instrumented the code to do the following: - After one million IOs, drop one IO response. - This will trigger an abort. Drop that abort response. - This will trigger a device reset. I'm seeing that the tag here is 0xFFFFFFFF (SCSI_NO_TAG). This tag fails the out-of-range tag check and escalates to host reset. I've been able to repro this three times with the same result. The expectation with this test case is that the device reset should go through successfully without escalating to host reset. Regards, Karan