On Monday, October 28, 2024 5:55 PM, Karan Tilak Kumar (kartilak) wrote: > > On Thursday, October 24, 2024 12:05 AM, Hannes Reinecke <hare@xxxxxxx> wrote: > > > > On 10/18/24 18:14, Karan Tilak Kumar wrote: > > > Modify IO path to use FDLS. > > > Add helper functions to process IOs. > > > Remove unused template functions. > > > Cleanup obsolete code. > > > Refactor old function definitions. > > > > > > Reviewed-by: Sesidhar Baddela <sebaddel@xxxxxxxxx> > > > Reviewed-by: Arulprabhu Ponnusamy <arulponn@xxxxxxxxx> > > > Reviewed-by: Gian Carlo Boffa <gcboffa@xxxxxxxxx> > > > Reviewed-by: Arun Easi <aeasi@xxxxxxxxx> > > > Signed-off-by: Karan Tilak Kumar <kartilak@xxxxxxxxx> > > > @@ -2274,11 +2494,11 @@ static int fnic_clean_pending_aborts(struct fnic *fnic, > > > int fnic_device_reset(struct scsi_cmnd *sc) > > > { > > > struct request *rq = scsi_cmd_to_rq(sc); > > > - struct fc_lport *lp; > > > struct fnic *fnic; > > > struct fnic_io_req *io_req = NULL; > > > struct fc_rport *rport; > > > int status; > > > + int count = 0; > > > int ret = FAILED; > > > unsigned long flags; > > > unsigned long start_time = 0; > > > @@ -2289,31 +2509,61 @@ int fnic_device_reset(struct scsi_cmnd *sc) > > > DECLARE_COMPLETION_ONSTACK(tm_done); > > > bool new_sc = 0; > > > uint16_t hwq = 0; > > > + struct fnic_iport_s *iport = NULL; > > > + struct rport_dd_data_s *rdd_data; > > > + struct fnic_tport_s *tport; > > > + u32 old_soft_reset_count; > > > + u32 old_link_down_cnt; > > > + int exit_dr = 0; > > > > > > /* Wait for rport to unblock */ > > > fc_block_scsi_eh(sc); > > > > > > /* Get local-port, check ready and link up */ > > > - lp = shost_priv(sc->device->host); > > > + fnic = *((struct fnic **) shost_priv(sc->device->host)); > > > + iport = &fnic->iport; > > > > > > - fnic = lport_priv(lp); > > > fnic_stats = &fnic->fnic_stats; > > > reset_stats = &fnic->fnic_stats.reset_stats; > > > > > > atomic64_inc(&reset_stats->device_resets); > > > > > > rport = starget_to_rport(scsi_target(sc->device)); > > > + > > > + spin_lock_irqsave(&fnic->fnic_lock, flags); > > > FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host, fnic->fnic_num, > > > - "fcid: 0x%x lun: 0x%llx hwq: %d mqtag: 0x%x flags: 0x%x Device reset\n", > > > + "fcid: 0x%x lun: %llu hwq: %d mqtag: 0x%x flags: 0x%x Device reset\n", > > > rport->port_id, sc->device->lun, hwq, mqtag, > > > fnic_priv(sc)->flags); > > > > > > - if (lp->state != LPORT_ST_READY || !(lp->link_up)) > > > + rdd_data = rport->dd_data; > > > + tport = rdd_data->tport; > > > + if (!tport) { > > > + FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host, fnic->fnic_num, > > > + "Dev rst called after tport delete! rport fcid: 0x%x lun: %llu\n", > > > + rport->port_id, sc->device->lun); > > > + spin_unlock_irqrestore(&fnic->fnic_lock, flags); > > > + goto fnic_device_reset_end; > > > + } > > > + > > > + if (iport->state != FNIC_IPORT_STATE_READY) { > > > + FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host, fnic->fnic_num, > > > + "iport NOT in READY state"); > > > + spin_unlock_irqrestore(&fnic->fnic_lock, flags); > > > goto fnic_device_reset_end; > > > + } > > > + > > > + if ((tport->state != FDLS_TGT_STATE_READY) && > > > + (tport->state != FDLS_TGT_STATE_ADISC)) { > > > + FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host, fnic->fnic_num, > > > + "tport state: %d\n", tport->state); > > > + spin_unlock_irqrestore(&fnic->fnic_lock, flags); > > > + goto fnic_device_reset_end; > > > + } > > > + spin_unlock_irqrestore(&fnic->fnic_lock, flags); > > > > > Please check if returning FAST_IO_FAIL here wouldn't be a better option. > > Most of the time a device reset is triggered by a command timeout, which > > typically happens due to a transport issue (eg link is down or something). > > So returning 'FAILED' will just escalate to host reset, and that can > > take for a really long time trying to abort all commands. > > Thanks for your insights Hannes. > We're investing this currently. > Thanks again for this comment, Hannes. We reviewed this internally and as we understand it, returning a FAST_IO_FAIL status would complete the scsi_cmnd. This would cause issues if the IO is still active in firmware. We do take care of cleaning up such I/Os in the higher error escalation path (host reset) though, which would be induced by returning the 'FAILED' return status here. Regards, Karan