Re: [PATCH 13/16] fnic: allocate device reset command on the fly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/14/23 03:29, Karan Tilak Kumar (kartilak) wrote:
On Monday, November 6, 2023 11:42 AM, Karan Tilak Kumar (kartilak) wrote:

On Monday, October 23, 2023 11:54 PM, Christoph Hellwig <hch@xxxxxx> wrote:

Adding the fnic maintainers as they are probably most qualified to review and test this.

On Mon, Oct 23, 2023 at 11:15:04AM +0200, Hannes Reinecke wrote:
Allocate a reset command on the fly instead of relying on using the
command which triggered the device failure.
This might fail if all available tags are busy, but in that case
it'll be safer to fall back to host reset anyway.


Thanks for this fix, Hannes.
I'm working on integrating these changes and testing them.
I'll get back to you about this.


I integrated your patch set using "b4 shazam" and built all the changes to do some dev testing.
I instrumented the code to do the following:

- After one million IOs, drop one IO response.
- This will trigger an abort. Drop that abort response.
- This will trigger a device reset. I'm seeing that the tag here is 0xFFFFFFFF (SCSI_NO_TAG).

This tag fails the out-of-range tag check and escalates to host reset.
I've been able to repro this three times with the same result.

The expectation with this test case is that the device reset should go through successfully without escalating to host reset.

Thanks for the report.
Certainly not what was intended with the patch. Can you try with this patch on top:

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 8909cf09cf9e..50eea6d2344a 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -2210,7 +2210,7 @@ int fnic_device_reset(struct scsi_device *sdev)

        req = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN,
                                 BLK_MQ_REQ_NOWAIT);
-       if (!req) {
+       if (IS_ERR(req)) {
                /*
                 * Request allocation might fail, indicating that
                 * all tags are busy.
@@ -2221,7 +2221,8 @@ int fnic_device_reset(struct scsi_device *sdev)
                 * to host reset anyway.
                 */
                FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host,
- "Device reset allocation failed, all tags busy\n");
+                             "Device reset allocation failed with %d\n",
+                             PTR_ERR(req));
                return ret;
        }

(The first hunk is actually a bugfix, and will be included with
the next submission anyway.)
Please enable fnic debug during testing and send me the logs.
It _might_ be that this is actually by design, as we will be
failing if all tags are busy (ie if the error is EWOULDBLOCK).
But if we get EAGAIN it means that the queue is blocked and
we'll need to investigate.

Thanks for your help here.

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux