Grant, I do agree with you, but we have reasons doing things this way. 1. This is only for development purpose. Normally when Firmware fault occurs driver will reset firmware so that further communication with firmware can resume. In this condition we were not able to track root cause of fault. In debug environment mpt_fwfault_debug will be set. In this case Driver will not reset firmware rather it will panic the system. Reason doing Fault. If we don't panic the system, we will get huge IO errors if any IO stress is running and error message will flood the syslog.(specially this things happens in Overnight stress test) Thoughts? Thanks Kashyap Desai -----Original Message----- From: Grant Grundler [mailto:grundler@xxxxxxxxxx] Sent: Wednesday, January 07, 2009 1:23 AM To: Desai, Kashyap Cc: linux-scsi@xxxxxxxxxxxxxxx; Moore, Eric; Prakash, Sathya; James.Bottomley@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH 3/5] mpt fusion: Added Firmware debug support On Tue, Jan 6, 2009 at 1:33 AM, Kashyap, Desai <kashyap.desai@xxxxxxx> wrote: > > Added support for Firmware debuging > --- > > Signed-off-by: Kashyap Desai <kadesai@xxxxxxx> > --- > diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c > index 55d9a7e..ea3aafb 100644 > --- a/drivers/message/fusion/mptbase.c > +++ b/drivers/message/fusion/mptbase.c > @@ -107,6 +107,14 @@ module_param_call(mpt_debug_level, mpt_set_debug_level, param_get_int, > MODULE_PARM_DESC(mpt_debug_level, " debug level - refer to mptdebug.h \ > - (default=0)"); > > +int mpt_fwfault_debug; > +EXPORT_SYMBOL(mpt_fwfault_debug); > +module_param_call(mpt_fwfault_debug, param_set_int, param_get_int, > + &mpt_fwfault_debug, 0600); > +MODULE_PARM_DESC(mpt_fwfault_debug, "Enable detection of Firmware fault" > + " and halt Firmware on fault - (default=0)"); > + > + > > #ifdef MFCNT > static int mfcounter = 0; > @@ -6337,6 +6345,33 @@ mpt_print_ioc_summary(MPT_ADAPTER *ioc, char *buffer, int *size, int len, int sh > *size = y; > } > > + > +/** > + * mpt_halt_firmware - Halts the firmware if it is operational and panic > + * the kernel > + * @ioc: Pointer to MPT_ADAPTER structure > + * > + **/ > +void > +mpt_halt_firmware(MPT_ADAPTER *ioc) > +{ > + u32 ioc_raw_state; > + > + ioc_raw_state = mpt_GetIocState(ioc, 0); > + > + if ((ioc_raw_state & MPI_IOC_STATE_MASK) == MPI_IOC_STATE_FAULT) { > + printk(MYIOC_s_ERR_FMT "IOC is in FAULT state (%04xh)!!!\n", > + ioc->name, ioc_raw_state & MPI_DOORBELL_DATA_MASK); > + panic("%s: IOC Fault (%04xh)!!!\n", ioc->name, > + ioc_raw_state & MPI_DOORBELL_DATA_MASK); Wouldn't it be better to NOT panic the machine? One can still prod things "manually" (read config and MMIO space registers) for further debugging. I'm thinking it would make more sense to call the variable "mpt_fwfault_panic" and use it to decide if the machine should panic or not. Thoughts? grant > + } else { > + CHIPREG_WRITE32(&ioc->chip->Doorbell, 0xC0FFEE00); > + panic("%s: Firmware is halted due to command timeout\n", > + ioc->name); > + } > +} > +EXPORT_SYMBOL(mpt_halt_firmware); > + > /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/ > /* > * Reset Handling > @@ -6369,6 +6404,8 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag) > printk(MYIOC_s_INFO_FMT "HardResetHandler Entered!\n", ioc->name); > printk("MF count 0x%x !\n", ioc->mfcnt); > #endif > + if (mpt_fwfault_debug) > + mpt_halt_firmware(ioc); > > /* Reset the adapter. Prevent more than 1 call to > * mpt_do_ioc_recovery at any instant in time. > diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h > index dff048c..b3e981d 100644 > --- a/drivers/message/fusion/mptbase.h > +++ b/drivers/message/fusion/mptbase.h > @@ -922,11 +922,14 @@ extern void mpt_free_fw_memory(MPT_ADAPTER *ioc); > extern int mpt_findImVolumes(MPT_ADAPTER *ioc); > extern int mptbase_sas_persist_operation(MPT_ADAPTER *ioc, u8 persist_opcode); > extern int mpt_raid_phys_disk_pg0(MPT_ADAPTER *ioc, u8 phys_disk_num, pRaidPhysDiskPage0_t phys_disk); > +extern void mpt_halt_firmware(MPT_ADAPTER *ioc); > + > > /* > * Public data decl's... > */ > extern struct list_head ioc_list; > +extern int mpt_fwfault_debug; > > /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/ > #endif /* } __KERNEL__ */ > diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c > index ee09041..e62c6bc 100644 > --- a/drivers/message/fusion/mptscsih.c > +++ b/drivers/message/fusion/mptscsih.c > @@ -1846,6 +1846,9 @@ mptscsih_abort(struct scsi_cmnd * SCpnt) > if (hd->timeouts < -1) > hd->timeouts++; > > + if (mpt_fwfault_debug) > + mpt_halt_firmware(ioc); > + > /* Most important! Set TaskMsgContext to SCpnt's MsgContext! > * (the IO to be ABORT'd) > * > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html