On 01/24/2014 08:46 AM, Desai, Kashyap wrote: > Hannes: > > We have already worked on "wait_event" usage in "megasas_issue_blocked_cmd". > That code will be posted by LSI once we received test result from LSI Q/A team. > > If you see the current OCR code in Linux Driver we do "re-send the IOCTL command". > MR product does not want IOCTL timeout due to some reason. That is why even if > FW faulted, Driver will do OCR and re-send all existing <Management commands> > (IOCTL comes under management commands). > > Just for info. (see below snippet in OCR code) > > /* Re-fire management commands */ > for (j = 0 ; j < instance->max_fw_cmds; j++) { > cmd_fusion = fusion->cmd_list[j]; > if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) { > cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx]; > if (cmd_mfi->frame->dcmd.opcode == MR_DCMD_LD_MAP_GET_INFO) { > megasas_return_cmd(instance, cmd_mfi); > megasas_return_cmd_fusion(instance, cmd_fusion); > > > > Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL path. > [ I added timeout only for limited DCMDs, which are harmless to continue after timeout ] > > As of now, you can skip this patch and we will be submitting patch to fix similar issue. > But note, we cannot add complete "wait_event_timeout" due to day-1 design, but will > try to cover wait_event_timout for some valid cases. > Ouch. The reason I sent this patch is that I've got an Intel box here, which blocks megaraid_sas initialisation when the IOMMU is turned on: [ 21.867264] megasas: io_request_frames ffff880800f50000 [ 21.867363] megasas: init frame 00000000fff57000 [ 22.223234] megasas: frame status 00 [ 22.223235] megasas: IOC Init cmd success [ 22.223282] megasas: ld map ffff88080b600000 [ 22.223289] megasas: issue dcmd 05 opcode 300e101 [ 22.244184] dmar: DRHD: handling fault status reg 2 [ 22.244186] dmar: DMAR:[DMA Read] Request device [06:00.0] fault addr 6980000 [ 22.244186] DMAR:[fault reason 06] PTE Read access is not set [ 22.247223] megasas: frame status 00 [ 22.247231] megasas: issue dcmd 05 opcode 300e101 [ 22.247231] megasas: INIT adapter done [ 22.247237] megasas: pd list ffff88080cfd0000 size 8192 [ 22.247237] megasas: issue dcmd 05 opcode 2010100 [ 22.253516] dmar: DRHD: handling fault status reg 102 [ 22.253518] dmar: DMAR:[DMA Write] Request device [06:00.0] fault addr e3f0000 [ 22.253518] DMAR:[fault reason 05] PTE Write access is not set [ 22.253521] dmar: DMAR:[DMA Write] Request device [06:00.0] fault addr e3f0000 [ 22.253521] DMAR:[fault reason 05] PTE Write access is not set [ 22.253523] dmar: DMAR:[DMA Write] Request device [06:00.0] fault addr e3f0000 [ Some more DMAR messages snipped ] [ 22.273199] dmar: DRHD: handling fault status reg 2 [ 22.273201] dmar: DMAR:[DMA Read] Request device [06:00.0] fault addr 6cef000 [ 22.273201] DMAR:[fault reason 06] PTE Read access is not set [ .. ] [ 94.222456] megasas: frame status ff [ 94.240946] megasas: failed to get PD list (I've inserted some debugging messages :-) This is really weird. The 'write' faults do correspond with the number of (megaraid) commands, reserved at the initial step. (This is a 'Fury' card, btw). What is more puzzling is that the INIT command and the initial LD List command goes through, but the PD List command gets blocked. Incidentally, this is not consistent; occasionally even the LD List command gets blocked, and the DMAR messages occur earlier. Anyway. Point is, if we cannot timout these initial commands the megaraid_sas driver will be stuck during initialisation (as the loop _never_ terminates). Which in turn means that the modprobe command hangs indefinitely, and you cannot even unload the module. The only way to recover here is a reboot. Nasty. Hence the patch for the timeout; when this triggers the HBA is pretty much hosed anyway, so the state of the firmware is pretty much irrelevant here. But at least you can continue to boot. (And OCR doesn't work at this point, neither. But that's a different story). Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html