Yes Thomas, we need to reset the non_operational_loop to zero after the transient event. Thanks, Sreekanth. -----Original Message----- From: Tomas Henzl [mailto:thenzl@xxxxxxxxxx] Sent: Monday, December 17, 2012 6:43 PM To: Reddy, Sreekanth Cc: jejb@xxxxxxxxxx; Nandigama, Nagalakshmi; JBottomley@xxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx; Prakash, Sathya Subject: Re: [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error On 12/17/2012 10:58 PM, Sreekanth Reddy wrote: > This patch stops the driver to invoke kthread (which remove the dead > ioc) for some time while EEH recovery has started. Thank you for posting this, the issue we have seen is resolved now. Shouldn't be an additional initialization added? So after a transient event the non_operational_loop is reset again? Tomas diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index fd3b3d7..480111c 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -208,6 +208,8 @@ _base_fault_reset_work(struct work_struct *work) return; /* don't rearm timer */ } + ioc->non_operational_loop = 0; + if ((doorbell & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) { rc = mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, FORCE_BIG_HAMMER); > > Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@xxxxxxx> > --- > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c > b/drivers/scsi/mpt2sas/mpt2sas_base.c > index ffd85c5..2349531 100755 > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c > @@ -155,7 +155,7 @@ _base_fault_reset_work(struct work_struct *work) > struct task_struct *p; > > spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags); > - if (ioc->shost_recovery) > + if (ioc->shost_recovery || ioc->pci_error_recovery) > goto rearm_timer; > spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags); > > @@ -164,6 +164,20 @@ _base_fault_reset_work(struct work_struct *work) > printk(MPT2SAS_INFO_FMT "%s : SAS host is non-operational !!!!\n", > ioc->name, __func__); > > + /* It may be possible that EEH recovery can resolve some of > + * pci bus failure issues rather removing the dead ioc function > + * by considering controller is in a non-operational state. So > + * here priority is given to the EEH recovery. If it doesn't > + * not resolve this issue, mpt2sas driver will consider this > + * controller to non-operational state and remove the dead ioc > + * function. > + */ > + if (ioc->non_operational_loop++ < 5) { > + spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, > + flags); > + goto rearm_timer; > + } > + > /* > * Call _scsih_flush_pending_cmds callback so that we flush all > * pending commands back to OS. This call is required to aovid @@ > -4386,6 +4400,7 @@ mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc) > if (missing_delay[0] != -1 && missing_delay[1] != -1) > _base_update_missing_delay(ioc, missing_delay[0], > missing_delay[1]); > + ioc->non_operational_loop = 0; > > return 0; > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h > b/drivers/scsi/mpt2sas/mpt2sas_base.h > index 543d8d6..c6ee7aa 100755 > --- a/drivers/scsi/mpt2sas/mpt2sas_base.h > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h > @@ -835,6 +835,7 @@ struct MPT2SAS_ADAPTER { > u16 cpu_msix_table_sz; > u32 ioc_reset_count; > MPT2SAS_FLUSH_RUNNING_CMDS schedule_dead_ioc_flush_running_cmds; > + u32 non_operational_loop; > > /* internal commands, callback index */ > u8 scsi_io_cb_idx; > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html