RE: [PATCH v5 09/14] scsi: fnic: Modify IO path to use FDLS

"Karan Tilak Kumar (kartilak)" <kartilak@xxxxxxxxx> · Wed, 20 Nov 2024 00:03:42 +0000

On Monday, October 28, 2024 5:55 PM, Karan Tilak Kumar (kartilak) wrote:
> 
> On Thursday, October 24, 2024 12:05 AM, Hannes Reinecke <hare@xxxxxxx> wrote:
> > 
> > On 10/18/24 18:14, Karan Tilak Kumar wrote:
> > > Modify IO path to use FDLS.
> > > Add helper functions to process IOs.
> > > Remove unused template functions.
> > > Cleanup obsolete code.
> > > Refactor old function definitions.
> > > 
> > > Reviewed-by: Sesidhar Baddela <sebaddel@xxxxxxxxx>
> > > Reviewed-by: Arulprabhu Ponnusamy <arulponn@xxxxxxxxx>
> > > Reviewed-by: Gian Carlo Boffa <gcboffa@xxxxxxxxx>
> > > Reviewed-by: Arun Easi <aeasi@xxxxxxxxx>
> > > Signed-off-by: Karan Tilak Kumar <kartilak@xxxxxxxxx>
> > > @@ -2274,11 +2494,11 @@ static int fnic_clean_pending_aborts(struct fnic *fnic,
> > >   int fnic_device_reset(struct scsi_cmnd *sc)
> > >   {
> > >   	struct request *rq = scsi_cmd_to_rq(sc);
> > > -	struct fc_lport *lp;
> > >   	struct fnic *fnic;
> > >   	struct fnic_io_req *io_req = NULL;
> > >   	struct fc_rport *rport;
> > >   	int status;
> > > +	int count = 0;
> > >   	int ret = FAILED;
> > >   	unsigned long flags;
> > >   	unsigned long start_time = 0;
> > > @@ -2289,31 +2509,61 @@ int fnic_device_reset(struct scsi_cmnd *sc)
> > >   	DECLARE_COMPLETION_ONSTACK(tm_done);
> > >   	bool new_sc = 0;
> > >   	uint16_t hwq = 0;
> > > +	struct fnic_iport_s *iport = NULL;
> > > +	struct rport_dd_data_s *rdd_data;
> > > +	struct fnic_tport_s *tport;
> > > +	u32 old_soft_reset_count;
> > > +	u32 old_link_down_cnt;
> > > +	int exit_dr = 0;
> > >   
> > >   	/* Wait for rport to unblock */
> > >   	fc_block_scsi_eh(sc);
> > >   
> > >   	/* Get local-port, check ready and link up */
> > > -	lp = shost_priv(sc->device->host);
> > > +	fnic = *((struct fnic **) shost_priv(sc->device->host));
> > > +	iport = &fnic->iport;
> > >   
> > > -	fnic = lport_priv(lp);
> > >   	fnic_stats = &fnic->fnic_stats;
> > >   	reset_stats = &fnic->fnic_stats.reset_stats;
> > >   
> > >   	atomic64_inc(&reset_stats->device_resets);
> > >   
> > >   	rport = starget_to_rport(scsi_target(sc->device));
> > > +
> > > +	spin_lock_irqsave(&fnic->fnic_lock, flags);
> > >   	FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host, fnic->fnic_num,
> > > -		"fcid: 0x%x lun: 0x%llx hwq: %d mqtag: 0x%x flags: 0x%x Device reset\n",
> > > +		"fcid: 0x%x lun: %llu hwq: %d mqtag: 0x%x flags: 0x%x Device reset\n",
> > >   		rport->port_id, sc->device->lun, hwq, mqtag,
> > >   		fnic_priv(sc)->flags);
> > >   
> > > -	if (lp->state != LPORT_ST_READY || !(lp->link_up))
> > > +	rdd_data = rport->dd_data;
> > > +	tport = rdd_data->tport;
> > > +	if (!tport) {
> > > +		FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host, fnic->fnic_num,
> > > +		  "Dev rst called after tport delete! rport fcid: 0x%x lun: %llu\n",
> > > +		  rport->port_id, sc->device->lun);
> > > +		spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> > > +		goto fnic_device_reset_end;
> > > +	}
> > > +
> > > +	if (iport->state != FNIC_IPORT_STATE_READY) {
> > > +		FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host, fnic->fnic_num,
> > > +					  "iport NOT in READY state");
> > > +		spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> > >   		goto fnic_device_reset_end;
> > > +	}
> > > +
> > > +	if ((tport->state != FDLS_TGT_STATE_READY) &&
> > > +		(tport->state != FDLS_TGT_STATE_ADISC)) {
> > > +		FNIC_SCSI_DBG(KERN_ERR, fnic->lport->host, fnic->fnic_num,
> > > +					  "tport state: %d\n", tport->state);
> > > +		spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> > > +		goto fnic_device_reset_end;
> > > +	}
> > > +	spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> > >  
> > Please check if returning FAST_IO_FAIL here wouldn't be a better option.
> > Most of the time a device reset is triggered by a command timeout, which
> > typically happens due to a transport issue (eg link is down or something).
> > So returning 'FAILED' will just escalate to host reset, and that can 
> > take for a really long time trying to abort all commands.
> 
> Thanks for your insights Hannes.
> We're investing this currently.
> 

Thanks again for this comment, Hannes.

We reviewed this internally and as we understand it, 
returning a FAST_IO_FAIL status would complete the scsi_cmnd. 
This would cause issues if the IO is still active in firmware. 
We do take care of cleaning up such I/Os in the higher error 
escalation path (host reset) though, which would be induced 
by returning the 'FAILED' return status here. 

Regards,
Karan