On Sunday, November 19, 2023 11:50 PM, Hannes Reinecke <hare@xxxxxxx> wrote: > Seems that 'blk_queue_enter()' called from scsi_alloc_request() is > failing, presumably as the queue is frozen/quiesced. > Can you try with the attached patch instead of the previous debug patch? > > On, and incidentally: there's an unlock missing: > > diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c > index 0278c4a207f3..47bcc6bd7376 100644 > --- a/drivers/scsi/fnic/fnic_scsi.c > +++ b/drivers/scsi/fnic/fnic_scsi.c > @@ -2233,8 +2233,10 @@ int fnic_device_reset(struct scsi_device *sdev) > io_lock = fnic_io_lock_hash(fnic, sc); > spin_lock_irqsave(io_lock, flags); > io_req = fnic_priv(sc)->io_req; > - if (io_req) > + if (io_req) { > + spin_unlock_irqrestore(io_lock, flags); > goto fnic_device_reset_end; > + } > > io_req = mempool_alloc(fnic->io_req_pool, GFP_ATOMIC); > if (!io_req) { > > Maybe fold it in with your patchset (if it's not already merged). > Thanks Hannes. I've modified the code based on your patch. Here's the repro log: Nov 20 13:59:01 rhel-c4s5 kernel: fnic<7>: UT: fnic_fcpio_icmnd_cmpl_handler: 847: tag: 0xf sc: 00000000d0bc6014 CDB Opcode: 0x28 Dropping icmnd completion Nov 20 13:59:31 rhel-c4s5 kernel: scsi host7: Abort Cmd called FCID 0x52061b, LUN 0x2 TAG f flags 3 Nov 20 13:59:31 rhel-c4s5 kernel: scsi host7: CBD Opcode: 28 Abort issued time: 30029 msec Nov 20 13:59:31 rhel-c4s5 kernel: scsi host7: fnic<7>: UT: fnic_fcpio_itmf_cmpl_handler: 1113: tag: 0xf sc: 00 status: FCPIO_IO_NOT_FOUND Dropping abort completion Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Returning from abort cmd type 2 FAILED Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Device reset called FCID 0x52061b, LUN 0x2 sc: 00000000d0bc6014 Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Device reset allocation failed (error -11) Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_reset called Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0xfffffc Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x52061b Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205f2 Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205cb Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205ca Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: update_mac 00:25:b5:cc:aa:00 Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Issued fw reset Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: set port_id 0 fp 0000000000000000 Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Returning from fnic reset SUCCESS Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0xfffffc Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_cleanup_io: tag:0xf : sc:0x00000000d0bc6014 duration = 52089 DID_TRANSPORT_DISRUPTED Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: Calling done for IO not issued to fw: tag:0xf sc:0x00000000d0bc6014 Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: reset cmpl success Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x52061b Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205f2 Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205cb Nov 20 13:59:53 rhel-c4s5 kernel: scsi host7: fnic_rport_exch_reset called portid 0x5205ca Nov 20 13:59:55 rhel-c4s5 kernel: host7: Assigned Port ID 0d0880 Nov 20 13:59:55 rhel-c4s5 kernel: scsi host7: set port_id d0880 fp 00000000a1c453b9 Nov 20 13:59:55 rhel-c4s5 kernel: scsi host7: update_mac 0e:fc:00:0d:08:80 Nov 20 13:59:55 rhel-c4s5 kernel: scsi host7: FLOGI reg issued fcid d0880 map 0 dest 8c:60:4f:95:ea:a4 Nov 20 13:59:55 rhel-c4s5 kernel: scsi host7: flog reg succeeded Nov 20 14:00:06 rhel-c4s5 kernel: sd 7:0:3:2: Power-on or device reset occurred Nov 20 14:00:06 rhel-c4s5 kernel: sd 7:0:3:2: alua: transition timeout set to 120 seconds Nov 20 14:00:06 rhel-c4s5 kernel: sd 7:0:3:2: alua: port group 3e9 state A non-preferred supports TolUsNA This is what the code looks like for the above test: req = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN, BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PM); if (IS_ERR(req)) { /* * Request allocation might fail, indicating that * all tags are busy. * But device reset will be called only from within * SCSI EH, at which time all I/O is stopped. So the * only active tags would be for failed I/O, but * when all I/O is failed it'll be better to escalate * to host reset anyway. */ FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host, "Device reset allocation failed (error %ld%s%s)\n", PTR_ERR(req), sdev->request_queue->mq_freeze_depth ? ",frozen" : "", sdev->quiesced_by ? ",quiesced" : ""); return ret; } sc = blk_mq_rq_to_pdu(req); tag = req->tag; io_lock = fnic_io_lock_hash(fnic, sc); spin_lock_irqsave(io_lock, flags); io_req = fnic_priv(sc)->io_req; if (io_req) { spin_unlock_irqrestore(io_lock, flags); goto fnic_device_reset_end; } Regards, Karan