RE: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by DPC

"Zhao, Haifeng" <haifeng.zhao@xxxxxxxxx> · Wed, 28 Apr 2021 01:42:12 +0000

I thought it would be merged into 5.12 release.  A little disappointed  :< , 
What can I do to help ?

Thanks,
Etan

-----Original Message-----
From: Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> 
Sent: Wednesday, April 28, 2021 8:40 AM
To: Lukas Wunner <lukas@xxxxxxxxx>; Bjorn Helgaas <helgaas@xxxxxxxxxx>; Williams, Dan J <dan.j.williams@xxxxxxxxx>
Cc: Zhao, Haifeng <haifeng.zhao@xxxxxxxxx>; Sinan Kaya <okaya@xxxxxxxxxx>; Raj, Ashok <ashok.raj@xxxxxxxxx>; Keith Busch <kbusch@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; Russell Currey <ruscur@xxxxxxxxxx>; Oliver O'Halloran <oohall@xxxxxxxxx>; Stuart Hayes <stuart.w.hayes@xxxxxxxxx>; Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
Subject: Re: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by DPC

Hi Bjorn,

On 3/30/21 1:53 PM, Kuppuswamy, Sathyanarayanan wrote:
>> Downstream Port Containment (PCIe Base Spec, sec. 6.2.10) disables 
>> the link upon an error and attempts to re-enable it when instructed 
>> by the DPC driver.
>>
>> A slot which is both DPC- and hotplug-capable is currently brought 
>> down by pciehp once DPC is triggered (due to the link change) and 
>> brought up on successful recovery.  That's undesirable, the slot 
>> should remain up so that the hotplugged device remains bound to its 
>> driver.  DPC notifies the driver of the error and of successful 
>> recovery in pcie_do_recovery() and the driver may then restore the device to working state.
>>
>> Moreover, Sinan points out that turning off slot power by pciehp may 
>> foil recovery by DPC:  Power off/on is a cold reset concurrently to 
>> DPC's warm reset.  Sathyanarayanan reports extended delays or failure 
>> in link retraining by DPC if pciehp brings down the slot.
>>
>> Fix by detecting whether a Link Down event is caused by DPC and 
>> awaiting recovery if so.  On successful recovery, ignore both the 
>> Link Down and the subsequent Link Up event.
>>
>> Afterwards, check whether the link is down to detect surprise-removal 
>> or another DPC event immediately after DPC recovery.  Ensure that the 
>> corresponding DLLSC event is not ignored by synthesizing it and 
>> invoking irq_wake_thread() to trigger a re-run of pciehp_ist().
>>
>> The IRQ threads of the hotplug and DPC drivers, pciehp_ist() and 
>> dpc_handler(), race against each other.  If pciehp is faster than 
>> DPC, it will wait until DPC recovery completes.
>>
>> Recovery consists of two steps:  The first step (waiting for link
>> disablement) is recognizable by pciehp through a set DPC Trigger 
>> Status bit.  The second step (waiting for link retraining) is 
>> recognizable through a newly introduced PCI_DPC_RECOVERING flag.
>>
>> If DPC is faster than pciehp, neither of the two flags will be set 
>> and pciehp may glean the recovery status from the new PCI_DPC_RECOVERED flag.
>> The flag is zero if DPC didn't occur at all, hence DLLSC events are 
>> not ignored by default.
>>
>> This commit draws inspiration from previous attempts to synchronize 
>> DPC with pciehp:
>>
>> By Sinan Kaya, August 2018:
>> https://lore.kernel.org/linux-pci/20180818065126.77912-1-okaya@kernel
>> .org/
>>
>> By Ethan Zhao, October 2020:
>> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao
>> @intel.com/
>>
>> By Sathyanarayanan Kuppuswamy, March 2021:
>> https://lore.kernel.org/linux-pci/59cb30f5e5ac6d65427ceaadf1012b2ba8d
>> bf66c.1615606143.git.sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx/
>>
> Looks good to me. This patch fixes the reported issue in our environment.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan 
> <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> Tested-by: Kuppuswamy Sathyanarayanan 
> <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>

Any update on this patch? is this queued for merge? One of our customers is looking for this fix. So wondering about the status.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer