RE: [PATCH 1/6] megaraid_sas: Do not wait forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@xxxxxxx]
> Sent: Friday, January 24, 2014 1:54 PM
> To: Desai, Kashyap; James Bottomley
> Cc: linux-scsi@xxxxxxxxxxxxxxx; Adam Radford; Saxena, Sumit
> Subject: Re: [PATCH 1/6] megaraid_sas: Do not wait forever
> 
> On 01/24/2014 08:46 AM, Desai, Kashyap wrote:
> > Hannes:
> >
> > We have already worked on "wait_event" usage in
> "megasas_issue_blocked_cmd".
> > That code will be posted  by LSI once we received test result from
> LSI Q/A team.
> >
> > If you see the current OCR code in Linux Driver we do "re-send the IOCTL
> command".
> > MR product does not want IOCTL timeout due to some reason. That is why
> > even if FW faulted, Driver will do OCR and re-send all existing
> <Management commands>
> > (IOCTL comes under management commands).
> >
> > Just for info. (see below snippet in  OCR code)
> >
> > /* Re-fire management commands */
> >                         for (j = 0 ; j < instance->max_fw_cmds; j++) {
> >                                 cmd_fusion = fusion->cmd_list[j];
> >                                 if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) {
> >                                         cmd_mfi = instance->cmd_list[cmd_fusion-
> >sync_cmd_idx];
> >                                         if (cmd_mfi->frame->dcmd.opcode ==
> MR_DCMD_LD_MAP_GET_INFO) {
> >                                                 megasas_return_cmd(instance, cmd_mfi);
> >
> > megasas_return_cmd_fusion(instance, cmd_fusion);
> >
> >
> >
> > Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL
> path.
> > [ I added timeout only for limited DCMDs, which are harmless to
> continue after timeout ]
> >
> > As of now, you can skip this patch and we will be submitting patch to fix
> similar issue.
> > But note, we cannot add complete "wait_event_timeout" due to day-1
> > design, but will try to cover wait_event_timout for some valid cases.
> >
> Ouch.
> 
> The reason I sent this patch is that I've got an Intel box here, which blocks
> megaraid_sas initialisation when the IOMMU is turned on:
> 
> [   21.867264] megasas: io_request_frames ffff880800f50000
> [   21.867363] megasas: init frame 00000000fff57000
> [   22.223234] megasas: frame status 00
> [   22.223235] megasas: IOC Init cmd success
> [   22.223282] megasas: ld map ffff88080b600000
> [   22.223289] megasas: issue dcmd 05 opcode 300e101
> [   22.244184] dmar: DRHD: handling fault status reg 2
> [   22.244186] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
> addr 6980000
> [   22.244186] DMAR:[fault reason 06] PTE Read access is not set
> [   22.247223] megasas: frame status 00
> [   22.247231] megasas: issue dcmd 05 opcode 300e101
> [   22.247231] megasas: INIT adapter done
> [   22.247237] megasas: pd list ffff88080cfd0000 size 8192
> [   22.247237] megasas: issue dcmd 05 opcode 2010100
> [   22.253516] dmar: DRHD: handling fault status reg 102
> [   22.253518] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
> addr e3f0000
> [   22.253518] DMAR:[fault reason 05] PTE Write access is not set
> [   22.253521] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
> addr e3f0000
> [   22.253521] DMAR:[fault reason 05] PTE Write access is not set
> [   22.253523] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
> addr e3f0000
> 
> [ Some more DMAR messages snipped ]
> 
> [   22.273199] dmar: DRHD: handling fault status reg 2
> [   22.273201] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
> addr 6cef000
> [   22.273201] DMAR:[fault reason 06] PTE Read access is not set
> 
> [ .. ]
> 
> [   94.222456] megasas: frame status ff
> [   94.240946] megasas: failed to get PD list
> 
> (I've inserted some debugging messages :-)
> 
> This is really weird. The 'write' faults do correspond with the number of
> (megaraid) commands, reserved at the initial step.
> (This is a 'Fury' card, btw).

Fury card has iMR FW and we have seen issue with iMR FW if IOMMU is ON, but not like driver load failure.
Is your OS driver behind Fury ? What is a Raid type used on your setup ?

Which system you are using ? 

> What is more puzzling is that the INIT command and the initial LD List
> command goes through, but the PD List command gets blocked.
> 
> Incidentally, this is not consistent; occasionally even the LD List command
> gets blocked, and the DMAR messages occur earlier.

LD command use megasas_issue_polled which is already timeout based mechanism.
Below are list of DCMD command which use infinite timeout.

megasas_get_seq_num
megasas_flush_cache
megasas_shutdown_controller
megasas_mgmt_fw_ioctl 


We can convert all DCMD except IOCTL with timeout value. For you " megasas_get_seq_num" might be hang in FW. It cannot be " megasas_get_ld_list".


> 
> Anyway. Point is, if we cannot timout these initial commands the
> megaraid_sas driver will be stuck during initialisation (as the loop _never_
> terminates).
> Which in turn means that the modprobe command hangs indefinitely, and
> you cannot even unload the module.
> The only way to recover here is a reboot.
> Nasty.
> 
> Hence the patch for the timeout; when this triggers the HBA is pretty much
> hosed anyway, so the state of the firmware is pretty much irrelevant here.
> But at least you can continue to boot.
> 
> (And OCR doesn't work at this point, neither. But that's a different story).
> 
> Cheers,
> 
> Hannes
> --
> Dr. Hannes Reinecke		      zSeries & Storage
> hare@xxxxxxx			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux