Re: [PATCH] pciehp: only wait command complete for really hotplug control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 25, 2014 at 11:27 AM, Rajat Jain <rajatxjain@xxxxxxxxx> wrote:
> On Mon, Feb 24, 2014 at 4:46 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>> [+cc Rajat]
>>
>> On Mon, Feb 24, 2014 at 4:59 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>>> On system with 16 PCI express hotplug slots, customer complain every slot
>>> will report "Command not completed in 1000 msec" during initialization.
>>>
>>> Intel says that we should only wait command complete only for
>>>            Electromechanical Interlock Control
>>>            Power Controller Control
>>>            Power Indicator Control
>>>            Attention Indicator Control
>
>
> Hello,
>
> I have a HW that generates Command Completion notifications, even
> though it does not implement any of the EMI, Power Controller, Power
> Indicator, Attention indicator. Although I'm not sure whether or not
> this behavior is compliant to PCIe spec, just wanted to pitch in to
> convey that there are HW which have this behavior:
>
> PCI bridge: Integrated Device Technology, Inc. Device 807a (rev 02)
> (prog-if 00 [Normal decode])
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Bus: primary=02, secondary=30, subordinate=3f, sec-latency=0
> I/O behind bridge: 00009000-00009fff
> Memory behind bridge: 88000000-8bffffff
> Prefetchable memory behind bridge: 00000000b1000000-00000000b11fffff
> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- <SERR- <PERR-
> BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>         PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00
>         DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                 ExtTag+ RBE+ FLReset-
>         DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                 RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                 MaxPayload 128 bytes, MaxReadReq 128 bytes
>         DevSta: CorrErr+ UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend-
>         LnkCap: Port #13, Speed 5GT/s, Width x4, ASPM L0s L1, Latency
> L0 <4us, L1 <4us
>                 ClockPM- Surprise+ LLActRep+ BwNot+
>         LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
>                 ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>         LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk-
> DLActive+ BWMgmt- ABWMgmt-
>         SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
>                 Slot #13, PowerLimit 0.000W; Interlock- NoCompl-
>         SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ HPIrq+ LinkChg+
>                 Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
>         SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
>                 Changed: MRL- PresDet- LinkState-
>         DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
> OBFF Not Supported ARIFwd+
>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
> OBFF Disabled ARIFwd-
>         LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
> Selectable De-emphasis: -6dB
>                  Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>                  Compliance De-emphasis: -6dB
>         LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete-, EqualizationPhase1-
>                  EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
> Capabilities: [c0] Power Management version 3
>         Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>         Address: 00000000fff41740  Data: 0008
> Capabilities: [100 v1] Advanced Error Reporting
>         UESta:  DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>         UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>         UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>         CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>         CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>         AERCap: First Error Pointer: 05, GenCap+ CGenEn- ChkCap+ ChkEn-
> Capabilities: [200 v1] Virtual Channel
>         Caps:   LPEVC=0 RefClk=100ns PATEntryBits=4
>         Arb:    Fixed- WRR32- WRR64- WRR128-
>         Ctrl:   ArbSelect=Fixed
>         Status: InProgress-
>         VC0:    Caps:   PATOffset=04 MaxTimeSlots=1 RejSnoopTrans-
>                 Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
>                 Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                 Status: NegoPending- InProgress-
>                 Port Arbitration Table <?>
> Capabilities: [320 v1] Access Control Services
>         ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+
> EgressCtrl+ DirectTrans+
>         ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
> EgressCtrl- DirectTrans-
> Capabilities: [330 v1] #12
> Kernel driver in use: pcieport
>
>
>>
>> Is there something in the spec that says this?  I'm looking at section
>> 6.7.3.2.  It says "If command completed events are supported, then
>> software must wait for a command to complete before issuing the next
>> command."  It's obvious that this is conditional on the "No Command
>> Completed Support" bit, but I don't see anything about a connection
>> with the other bits you mentioned.
>>
>> Is this related to an erratum in the Intel/AMD/Nvidia chipsets?  If
>> so, we need at least a comment about that in the source, and
>> preferably, some sort of quirk for these chipsets.
>>
>> Otherwise, we're likely to break this in the future when somebody
>> reading the code says "this doesn't correspond with what the spec
>> says."  In case we *do* break this in the future, I'd like to have a
>> bugzilla that contains the dmesg showing the problem and "lspci -vv"
>> so we have a hope of figuring out what systems this affects.
>>
>> This is similar to the issue Rajat is working on with
>> http://patchwork.ozlabs.org/patch/322387/ .  I haven't merged that
>> yet, but I probably will.  Is there any connection between that patch
>> and this issue?  Does this issue still happen even with Rajat's patch
>> applied?
>
> Also, wanted to my mention that my patch applies equally to the
> systems that exhibit this behavior and the other systems (Intel etc)
> that do not exhibit this behavior. That is because if for any reason
> (e.g. spurious interrupt) the condition "if (slot_status &
> PCI_EXP_SLTSTA_CC)" becomes trues in pcie_write_cmd(), we do not clear
> this bit but expect "cmd completed" events to be generated in future.
> In other words my patch is independent of Yinghai's patch, and the
> fact it helps my systems is a (desired) side effect.

Forgot to add, the ideal fix for my systems would have been if the
"no_cmd_complete" could be set to 0 for my HW in the pcie_init(), but
alas, that's not true today:

        if (NO_CMD_CMPL(ctrl) ||
            !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
            ctrl->no_cmd_complete = 1;

So I use Bjorn's patch for this:
http://www.spinics.net/lists/hotplug/msg05830.html



>
> Lastly, it seems to me that if a platform does not support the 4 HP
> elements mentioned above, the only other reason to write to Slot_Ctrl
> register will be to enable and disable notifications (during init /
> suspend / resume / reset_slot etc). So I was wondering if it would
> help if we explicitly cleared the CC bit after we enable disable
> hotplug notifications? If such a change was there, then probably we'd
> be able to cater both kind of systems, and making sure that none sees
> a delay at boot up. But yes, I do agree that this is going to create
> more haphazardness in the system (the code flow is already difficult
> to follow, specially for "cmd completed")
>
> Thanks,
>
> Rajat
>
>>
>> Bjorn
>>
>>> System with AMD/Nvidia chipset have same problem.
>>>
>>> So change to only wait when following bits get changed.
>>>         PCI_EXP_SLTCTL_EIC
>>>         PCI_EXP_SLTCTL_PCC
>>>         PCI_EXP_SLTCTL_PIC
>>>         PCI_EXP_SLTCTL_AIC
>>>
>>> With that we could set 16 seconds during booting, later with 32 sockets system
>>> with 64 pcie hotplug slots we could save 64 seconds.
>>>
>>> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
>>>
>>> ---
>>>  drivers/pci/hotplug/pciehp_hpc.c |   11 ++++++-----
>>>  include/uapi/linux/pci_regs.h    |    5 +++++
>>>  2 files changed, 11 insertions(+), 5 deletions(-)
>>>
>>> Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
>>> +++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
>>> @@ -152,8 +152,8 @@ static void pcie_wait_cmd(struct control
>>>  static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>>>  {
>>>         struct pci_dev *pdev = ctrl_dev(ctrl);
>>> +       u16 slot_ctrl, old_slot_ctrl;
>>>         u16 slot_status;
>>> -       u16 slot_ctrl;
>>>
>>>         mutex_lock(&ctrl->ctrl_lock);
>>>
>>> @@ -181,9 +181,8 @@ static void pcie_write_cmd(struct contro
>>>                 }
>>>         }
>>>
>>> -       pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
>>> -       slot_ctrl &= ~mask;
>>> -       slot_ctrl |= (cmd & mask);
>>> +       pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &old_slot_ctrl);
>>> +       slot_ctrl = (old_slot_ctrl & ~mask) | (cmd & mask);
>>>         ctrl->cmd_busy = 1;
>>>         smp_mb();
>>>         pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
>>> @@ -191,7 +190,9 @@ static void pcie_write_cmd(struct contro
>>>         /*
>>>          * Wait for command completion.
>>>          */
>>> -       if (!ctrl->no_cmd_complete) {
>>> +       if (!ctrl->no_cmd_complete &&
>>> +           ((PCI_EXP_SLTCTL_WAIT_MASK & old_slot_ctrl) !=
>>> +            (PCI_EXP_SLTCTL_WAIT_MASK & slot_ctrl))) {
>>>                 int poll = 0;
>>>                 /*
>>>                  * if hotplug interrupt is not enabled or command
>>> Index: linux-2.6/include/uapi/linux/pci_regs.h
>>> ===================================================================
>>> --- linux-2.6.orig/include/uapi/linux/pci_regs.h
>>> +++ linux-2.6/include/uapi/linux/pci_regs.h
>>> @@ -535,6 +535,11 @@
>>>  #define  PCI_EXP_SLTCTL_PWR_OFF        0x0400 /* Power Off */
>>>  #define  PCI_EXP_SLTCTL_EIC    0x0800  /* Electromechanical Interlock Control */
>>>  #define  PCI_EXP_SLTCTL_DLLSCE 0x1000  /* Data Link Layer State Changed Enable */
>>> +/* only need to wait command complete for hpc related */
>>> +#define  PCI_EXP_SLTCTL_WAIT_MASK (PCI_EXP_SLTCTL_EIC | \
>>> +                                  PCI_EXP_SLTCTL_PCC | \
>>> +                                  PCI_EXP_SLTCTL_PIC | \
>>> +                                  PCI_EXP_SLTCTL_AIC)
>>>  #define PCI_EXP_SLTSTA         26      /* Slot Status */
>>>  #define  PCI_EXP_SLTSTA_ABP    0x0001  /* Attention Button Pressed */
>>>  #define  PCI_EXP_SLTSTA_PFD    0x0002  /* Power Fault Detected */
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux