RE: [PATCH 13/16] fnic: allocate device reset command on the fly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday, November 19, 2023 11:50 PM, Hannes Reinecke <hare@xxxxxxx> wrote:
> Seems that 'blk_queue_enter()' called from scsi_alloc_request() is
> failing, presumably as the queue is frozen/quiesced.
> Can you try with the attached patch instead of the previous debug patch?
>
> On, and incidentally: there's an unlock missing:
>
> diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
> index 0278c4a207f3..47bcc6bd7376 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
> @@ -2233,8 +2233,10 @@ int fnic_device_reset(struct scsi_device *sdev)
> io_lock = fnic_io_lock_hash(fnic, sc);
> spin_lock_irqsave(io_lock, flags);
> io_req = fnic_priv(sc)->io_req;
> -       if (io_req)
> +       if (io_req) {
> +               spin_unlock_irqrestore(io_lock, flags);
> goto fnic_device_reset_end;
> +       }
>
> io_req = mempool_alloc(fnic->io_req_pool, GFP_ATOMIC);
> if (!io_req) {
>
> Maybe fold it in with your patchset (if it's not already merged).
>

Thanks Hannes. I've modified the code based on your patch. Here's the repro log:

Nov 20 13:59:01 rhel-c4s5 kernel: fnic<7>: UT: fnic_fcpio_icmnd_cmpl_handler: 847: tag: 0xf sc: 00000000d0bc6014 CDB Opcode: 0x28 Dropping icmnd completion
Nov 20 13:59:31 rhel-c4s5 kernel: scsi host7: Abort Cmd called FCID 0x52061b, LUN 0x2 TAG f flags 3
Nov 20 13:59:31 rhel-c4s5 kernel: scsi host7: CBD Opcode: 28 Abort issued time: 30029 msec
Nov 20 13:59:31 rhel-c4s5 kernel: scsi host7: fnic<7>: UT: fnic_fcpio_itmf_cmpl_handler: 1113: tag: 0xf sc: 00 status: FCPIO_IO_NOT_FOUND Dropping abort completion
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Returning from abort cmd type 2 FAILED
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Device reset called FCID 0x52061b, LUN 0x2 sc: 00000000d0bc6014
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Device reset allocation failed (error -11)
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_reset called
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0xfffffc
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x52061b
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205f2
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205cb
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205ca
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: update_mac 00:25:b5:cc:aa:00
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Issued fw reset
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: set port_id 0 fp 0000000000000000
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Returning from fnic reset SUCCESS
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0xfffffc
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_cleanup_io: tag:0xf : sc:0x00000000d0bc6014 duration = 52089 DID_TRANSPORT_DISRUPTED
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Calling done for IO not issued to fw: tag:0xf sc:0x00000000d0bc6014
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: reset cmpl success
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x52061b
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205f2
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205cb
Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205ca
Nov 20 13:59:55 rhel-c4s5 kernel: host7: Assigned Port ID 0d0880
Nov 20 13:59:55 rhel-c4s5 kernel: scsi host7: set port_id d0880 fp 00000000a1c453b9
Nov 20 13:59:55 rhel-c4s5 kernel: scsi host7: update_mac 0e:fc:00:0d:08:80
Nov 20 13:59:55 rhel-c4s5 kernel: scsi host7: FLOGI reg issued fcid d0880 map 0 dest 8c:60:4f:95:ea:a4
Nov 20 13:59:55 rhel-c4s5 kernel: scsi host7: flog reg succeeded
Nov 20 14:00:06 rhel-c4s5 kernel: sd 7:0:3:2: Power-on or device reset occurred
Nov 20 14:00:06 rhel-c4s5 kernel: sd 7:0:3:2: alua: transition timeout set to 120 seconds
Nov 20 14:00:06 rhel-c4s5 kernel: sd 7:0:3:2: alua: port group 3e9 state A non-preferred supports TolUsNA

This is what the code looks like for the above test:

        req = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN,
                                 BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PM);
    if (IS_ERR(req)) {
                /*
                 * Request allocation might fail, indicating that
                 * all tags are busy.
                 * But device reset will be called only from within
                 * SCSI EH, at which time all I/O is stopped. So the
                 * only active tags would be for failed I/O, but
                 * when all I/O is failed it'll be better to escalate
                 * to host reset anyway.
                 */
                FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host,
                              "Device reset allocation failed (error %ld%s%s)\n",
                              PTR_ERR(req),
                              sdev->request_queue->mq_freeze_depth ? ",frozen" : "",
                              sdev->quiesced_by ? ",quiesced" : "");
                return ret;
        }
        sc = blk_mq_rq_to_pdu(req);

        tag = req->tag;
        io_lock = fnic_io_lock_hash(fnic, sc);
        spin_lock_irqsave(io_lock, flags);
        io_req = fnic_priv(sc)->io_req;
        if (io_req) {
            spin_unlock_irqrestore(io_lock, flags);
                goto fnic_device_reset_end;
        }

Regards,
Karan




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux