> -----Original Message----- > From: Hannes Reinecke [mailto:hare@xxxxxxx] > Sent: Friday, January 24, 2014 1:54 PM > To: Desai, Kashyap; James Bottomley > Cc: linux-scsi@xxxxxxxxxxxxxxx; Adam Radford; Saxena, Sumit > Subject: Re: [PATCH 1/6] megaraid_sas: Do not wait forever > > On 01/24/2014 08:46 AM, Desai, Kashyap wrote: > > Hannes: > > > > We have already worked on "wait_event" usage in > "megasas_issue_blocked_cmd". > > That code will be posted by LSI once we received test result from > LSI Q/A team. > > > > If you see the current OCR code in Linux Driver we do "re-send the IOCTL > command". > > MR product does not want IOCTL timeout due to some reason. That is why > > even if FW faulted, Driver will do OCR and re-send all existing > <Management commands> > > (IOCTL comes under management commands). > > > > Just for info. (see below snippet in OCR code) > > > > /* Re-fire management commands */ > > for (j = 0 ; j < instance->max_fw_cmds; j++) { > > cmd_fusion = fusion->cmd_list[j]; > > if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) { > > cmd_mfi = instance->cmd_list[cmd_fusion- > >sync_cmd_idx]; > > if (cmd_mfi->frame->dcmd.opcode == > MR_DCMD_LD_MAP_GET_INFO) { > > megasas_return_cmd(instance, cmd_mfi); > > > > megasas_return_cmd_fusion(instance, cmd_fusion); > > > > > > > > Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL > path. > > [ I added timeout only for limited DCMDs, which are harmless to > continue after timeout ] > > > > As of now, you can skip this patch and we will be submitting patch to fix > similar issue. > > But note, we cannot add complete "wait_event_timeout" due to day-1 > > design, but will try to cover wait_event_timout for some valid cases. > > > Ouch. > > The reason I sent this patch is that I've got an Intel box here, which blocks > megaraid_sas initialisation when the IOMMU is turned on: > > [ 21.867264] megasas: io_request_frames ffff880800f50000 > [ 21.867363] megasas: init frame 00000000fff57000 > [ 22.223234] megasas: frame status 00 > [ 22.223235] megasas: IOC Init cmd success > [ 22.223282] megasas: ld map ffff88080b600000 > [ 22.223289] megasas: issue dcmd 05 opcode 300e101 > [ 22.244184] dmar: DRHD: handling fault status reg 2 > [ 22.244186] dmar: DMAR:[DMA Read] Request device [06:00.0] fault > addr 6980000 > [ 22.244186] DMAR:[fault reason 06] PTE Read access is not set > [ 22.247223] megasas: frame status 00 > [ 22.247231] megasas: issue dcmd 05 opcode 300e101 > [ 22.247231] megasas: INIT adapter done > [ 22.247237] megasas: pd list ffff88080cfd0000 size 8192 > [ 22.247237] megasas: issue dcmd 05 opcode 2010100 > [ 22.253516] dmar: DRHD: handling fault status reg 102 > [ 22.253518] dmar: DMAR:[DMA Write] Request device [06:00.0] fault > addr e3f0000 > [ 22.253518] DMAR:[fault reason 05] PTE Write access is not set > [ 22.253521] dmar: DMAR:[DMA Write] Request device [06:00.0] fault > addr e3f0000 > [ 22.253521] DMAR:[fault reason 05] PTE Write access is not set > [ 22.253523] dmar: DMAR:[DMA Write] Request device [06:00.0] fault > addr e3f0000 > > [ Some more DMAR messages snipped ] > > [ 22.273199] dmar: DRHD: handling fault status reg 2 > [ 22.273201] dmar: DMAR:[DMA Read] Request device [06:00.0] fault > addr 6cef000 > [ 22.273201] DMAR:[fault reason 06] PTE Read access is not set > > [ .. ] > > [ 94.222456] megasas: frame status ff > [ 94.240946] megasas: failed to get PD list > > (I've inserted some debugging messages :-) > > This is really weird. The 'write' faults do correspond with the number of > (megaraid) commands, reserved at the initial step. > (This is a 'Fury' card, btw). Fury card has iMR FW and we have seen issue with iMR FW if IOMMU is ON, but not like driver load failure. Is your OS driver behind Fury ? What is a Raid type used on your setup ? Which system you are using ? > What is more puzzling is that the INIT command and the initial LD List > command goes through, but the PD List command gets blocked. > > Incidentally, this is not consistent; occasionally even the LD List command > gets blocked, and the DMAR messages occur earlier. LD command use megasas_issue_polled which is already timeout based mechanism. Below are list of DCMD command which use infinite timeout. megasas_get_seq_num megasas_flush_cache megasas_shutdown_controller megasas_mgmt_fw_ioctl We can convert all DCMD except IOCTL with timeout value. For you " megasas_get_seq_num" might be hang in FW. It cannot be " megasas_get_ld_list". > > Anyway. Point is, if we cannot timout these initial commands the > megaraid_sas driver will be stuck during initialisation (as the loop _never_ > terminates). > Which in turn means that the modprobe command hangs indefinitely, and > you cannot even unload the module. > The only way to recover here is a reboot. > Nasty. > > Hence the patch for the timeout; when this triggers the HBA is pretty much > hosed anyway, so the state of the firmware is pretty much irrelevant here. > But at least you can continue to boot. > > (And OCR doesn't work at this point, neither. But that's a different story). > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke zSeries & Storage > hare@xxxxxxx +49 911 74053 688 > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html