Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Bjorn,

On Tue, Nov 5, 2013 at 4:25 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> On Tue, Nov 05, 2013 at 02:33:28PM -0800, Rajat Jain wrote:
>> In case of a spurious "cmd completed", pcie_write_cmd() does not
>> clear it, but yet expects more "cmd completed" events to be generated.
>> This does not happen because the previous (spurious) event has not
>> been acknowledged. Fix that.
>>
>> Signed-off-by: Rajat Jain <rajatjain@xxxxxxxxxxx>
>> Signed-off-by: Guenter Roeck <groeck@xxxxxxxxxxx>
>> ---
>> This is how I saw it in action: my controller does not implement any
>> hot-plug elements (LED, power ctrl, EMI etc) but still supports Command
>> completed bit.
>>  - During initialization,
>>       pcie_disable_notification()
>>       -> pcie_write_cmd()
>>          -> writes to Slot control register
>>             -> which causes PCI_EXP_SLTSTA_CC to get set, which is not
>>                cleared, because IRQ is not generated (we just disabled
>>                notifications).
>>  - After some time,
>>       pcie_enable_notification()
>>       -> pcie_write_cmd()
>>           -> finds PCI_EXP_SLTSTA_CC is set, assumes it is spurious.
>>           -> Does not clear it, yet expects more command completed
>>              events to be generated (never happens).
>
> I'm not sure this "cmd completed" is actually spurious.  Spec section
> 7.8.10 is very clear that any write to Slot Control must cause a
> hot-plug command to be generated (if the port is hot-plug capable).

I agree, and I think I am witnessing a "genuine" command completion
(caused by disabling of notifications).

>
> Can you collect "lspci -vv" output for your controller?
 >

Sure:

PCI bridge: Integrated Device Technology, Inc. Device 807a (rev 02)
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Bus: primary=02, secondary=50, subordinate=5f, sec-latency=0
I/O behind bridge: 00003000-00003fff
Memory behind bridge: 8c000000-8fffffff
Prefetchable memory behind bridge: 00000000b0600000-00000000b07fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
        PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00
        DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                ExtTag+ RBE+ FLReset-
        DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
        LnkCap: Port #3, Speed 5GT/s, Width x4, ASPM L0s L1, Latency
L0 <4us, L1 <4us
                ClockPM- Surprise+ LLActRep+ BwNot+
        LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
                ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk-
DLActive- BWMgmt- ABWMgmt-
        SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
                Slot #3, PowerLimit 0.000W; Interlock- NoCompl-
        SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ HPIrq+ LinkChg+
                Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
        SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt+ PresDet- Interlock-
                Changed: MRL- PresDet- LinkState-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
OBFF Not Supported ARIFwd+
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
OBFF Disabled ARIFwd-
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
                 Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                 Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                 EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [c0] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Address: 00000000ff041740  Data: 0003
Capabilities: [100 v1] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [200 v1] Virtual Channel
        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=4
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:   ArbSelect=Fixed
        Status: InProgress-
        VC0:    Caps:   PATOffset=04 MaxTimeSlots=1 RejSnoopTrans-
                Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                Status: NegoPending- InProgress-
                Port Arbitration Table <?>
Capabilities: [320 v1] Access Control Services
        ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl+ DirectTrans+
        ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
EgressCtrl- DirectTrans-
Capabilities: [330 v1] #12
Kernel driver in use: pcieport

In addition, here is what the pciehp driver spews out:

Hotplug Controller:
  Seg/Bus/Dev/Func/IRQ : 0000:02:03.0 IRQ 21
  Vendor ID            : 0x111d
  Device ID            : 0x807a
  Subsystem ID         : 0x0000
  Subsystem Vendor ID  : 0x0000
  PCIe Cap offset      : 0x40
  PCI resource [7]     : [io  0x13000-0x13fff]
  PCI resource [8]     : [mem 0xc0c000000-0xc0fffffff]
  PCI resource [9]     : [mem 0xc30600000-0xc307fffff 64bit pref]
Slot Capabilities      : 0x00180040
  Physical Slot Number : 3
  Attention Button     :  no
  Power Controller     :  no
  MRL Sensor           :  no
  Attention Indicator  :  no
  Power Indicator      :  no
  Hot-Plug Surprise    :  no
  EMI Present          :  no
  Command Completed    : yes
Slot Status            : 0x0010
Slot Control           : 0x0000
Link Active Reporting supported
HPC vendor_id 111d device_id 807a ss_vid 0 ss_did 0
Registering domain:bus:dev=0000:50:00 sun=3
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec

The last two output lines are part of the problem. Each controller
takes 1 second to initialized as the message shows.

>  I assume
> you're hitting this case in pcie_init() (added by 5808639bfa98
> ("pciehp: fix slow probing")):
>
>         /*
>          * Controller doesn't notify of command completion if the "No
>          * Command Completed Support" bit is set in Slot Capability
>          * register or the controller supports none of power
>          * controller, attention led, power led and EMI.
>          */
>         if (NO_CMD_CMPL(ctrl) ||
>             !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
>             ctrl->no_cmd_complete = 1;
>
> and we're setting "no_cmd_complete = 1" for your controller, which
> keeps us from waiting for completion in pcie_write_cmd().
>
> I'm dubious about the assertion that a controller without power
> control, attention LED, power LED, or EMI can't support command
> completion.  I don't see anything in the spec to that effect.

I agree, specially since I am seeing this contrary behavior (to that
assertion) in action :-)

>
> Since you're seeing PCI_EXP_SLTSTA_CC=1, your controller *should*
> support Command Completion notification and PCI_EXP_SLTCAP_NCCS should
> be 0 (per Table 7-20), so I wonder what happens on your system if you
> change pcie_init() so it leaves "ctrl->no_cmd_complete = 0" instead?
> Does it work correctly then?

Yes, it work well after that (Both the error output lines go away as
well). And the 1 second delay per controller is also gone.

>
> I know we can't just drop the "!(POWER_CTRL(ctrl) | ...)" tests
> because we don't want to reintroduce the problem fixed by
> 5808639bfa98, but I wonder if we can find a better fix that addresses
> both problems.

Please let me know if any one has any suggestions? IMHO, this patch
should yet not cause the original problem to reappear because we are
clearing the bit *only* when we know that it is already set (and no
way to clear it because at least in my controller no subsequent
interrupts can be generated). But I do not have enough understanding,
and will be glad to get any pointers.

Thanks,

- Rajat

>
> Bjorn
>
>>
>>  drivers/pci/hotplug/pciehp_hpc.c |    1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index 5b8d749..ba8e06f 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>>       }
>>
>>       if (slot_status & PCI_EXP_SLTSTA_CC) {
>> +             pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
>>               if (!ctrl->no_cmd_complete) {
>>                       /*
>>                        * After 1 sec and CMD_COMPLETED still not set, just
>> --
>> 1.7.9.5
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux