RE: HW power fault defect cause system hang on kernel 5.4.y

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stuart,
The patch you attached does not work for me, the logic should be ok but I have not figured out why. But a loop counter actually helps mitigate this issue. Thank you very much!

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 88b996764ff9..3d2c336ff740 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -529,7 +529,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
        struct controller *ctrl = (struct controller *)dev_id;
        struct pci_dev *pdev = ctrl_dev(ctrl);
        struct device *parent = pdev->dev.parent;
-       u16 status, events = 0;
+       u16 status, events, read_retry_count = 0;
+       u8 READ_RETRY_MAX = 6;

        /*
         * Interrupts only occur in D3hot or shallower and only if enabled
@@ -585,7 +586,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
                return IRQ_NONE;
        }

-       if (status) {
+       if (status && (read_retry_count < READ_RETRY_MAX)) {
                pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);

                /*
@@ -594,8 +595,10 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
                 * So re-read the Slot Status register in case a bit was set
                 * between read and write.
                 */
-               if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode)
+               if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode) {
+                       read_retry_count++;
                        goto read_status;
+               }
        }
Regards.
Joseph

Regards.
Joseph

-----Original Message-----
From: stuart hayes <stuart.w.hayes@xxxxxxxxx> 
Sent: Tuesday, November 9, 2021 11:37 PM
To: Bao, Joseph <joseph.bao@xxxxxxxxx>; Bjorn Helgaas <helgaas@xxxxxxxxxx>
Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Lukas Wunner <lukas@xxxxxxxxx>
Subject: Re: HW power fault defect cause system hang on kernel 5.4.y



On 11/9/2021 1:59 AM, Bao, Joseph wrote:
> Hi Lukas/Stuart,
> Want to follow up with you whether the system hang is expected when HW has a defect keeping PCI_EXP_SLTSTA_PFD always HIGH.
> 
> 
> Regards
> Joseph
> 

It does appear that the code will hang when pciehp_isr sees PFD high and power_fault_detected isn't yet set, if PFD doesn't clear when a 1 is written to it.  It will continue to loop trying to clear it, and power_fault_detected won't get set until after it gets through this loop.

It wouldn't be hard to modify that code to only attempt to clear each bit once.  I wouldn't expect the same event bit to get set twice within this loop, so this might fix it (I did not test).  Alternately, a loop counter could be added to prevent it from looping more than some arbitrary number
(6?) of times in case of stuck bits.

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..3e502b4e8ef7 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -594,7 +594,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
  	struct controller *ctrl = (struct controller *)dev_id;
  	struct pci_dev *pdev = ctrl_dev(ctrl);
  	struct device *parent = pdev->dev.parent;
-	u16 status, events = 0;
+	u16 changed, status, events = 0;
  
  	/*
  	 * Interrupts only occur in D3hot or shallower and only if enabled @@ -643,6 +643,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
  	if (ctrl->power_fault_detected)
  		status &= ~PCI_EXP_SLTSTA_PFD;
  
+	changed = status ^ events;
  	events |= status;
  	if (!events) {
  		if (parent)
@@ -659,7 +660,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
  		 * So re-read the Slot Status register in case a bit was set
  		 * between read and write.
  		 */
-		if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode)
+		if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode && changed)
  			goto read_status;
  	}
  


> -----Original Message-----
> From: Bjorn Helgaas <helgaas@xxxxxxxxxx>> Sent: Wednesday, November 3, 
> 2021 6:34 AM
> To: Bao, Joseph <joseph.bao@xxxxxxxxx>
> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; 
> linux-kernel@xxxxxxxxxxxxxxx; Stuart Hayes <stuart.w.hayes@xxxxxxxxx>; 
> Lukas Wunner <lukas@xxxxxxxxx>
> Subject: Re: HW power fault defect cause system hang on kernel 5.4.y
> 
> [+cc Stuart, author of 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt 
> race"), Lukas, pciehp expert]
> 
> On Tue, Nov 02, 2021 at 03:45:00AM +0000, Bao, Joseph wrote:
>> Hi, dear kernel developer,
>>
>> Recently we encounter system hang (dead spinlock) when move to kernel 
>> linux-5.4.y.
>>
>> Finally, we use bisect to locate the suspicious commit 
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=4667358dab9cc07da044d5bc087065545b1000df.
> 
> 4667358dab9c backported upstream commit 8edf5332c393 ("PCI: pciehp:
> Fix MSI interrupt race") to v5.4.69 just over a year ago.
> 
>> Our system has some HW defect, which will wrongly set 
>> PCI_EXP_SLTSTA_PFD high, and this commit will lead to infinite loop 
>> jumping to read_status (no chance to clear status PCI_EXP_SLTSTA_PFD 
>> bit since ctrl is not updated), I know this is our HW defect, but 
>> this commit makes kernel trapped in this isr function and leads to 
>> kernel hang (then the user could not get useful information to show 
>> what's wrong), which I think is not expected behavior, so I would 
>> like to report to you for discussion.
> 
> I guess this happens because the first time we handle PFD,
> pciehp_ist() sets "ctrl->power_fault_detected = 1", and when power_fault_detected is set, pciehp_isr() won't clear PFD from PCI_EXP_SLTSTA?
> 
> It looks like the only place we clear power_fault_detected is in pciehp_power_on_slot(), and I don't think we call that unless we have a presence detect or link status change.
> 
> It would definitely be nice if we could arrange so this hardware defect didn't cause a kernel hang.
> 
> I think the diff below is the backport of 8edf5332c393 ("PCI: pciehp:
> Fix MSI interrupt race").
> 
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
>> b/drivers/pci/hotplug/pciehp_hpc.c
>> index 356786a3b7f4b..88b996764ff95 100644
>> ---
>> a/https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tr
>> e
>> e/drivers/pci/hotplug/pciehp_hpc.c?h=linux-5.4.y&id=ca767cf0152d18fc2
>> 9
>> 9cde85b18d1f46ac21e1ba
>> +++ b/https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.gi
>> +++ t 
>> +++ /tree/drivers/pci/hotplug/pciehp_hpc.c?h=linux-5.4.y&id=4667358da
>> +++ b
>> +++ 9cc07da044d5bc087065545b1000df
>> @@ -529,7 +529,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>>   	struct controller *ctrl = (struct controller *)dev_id;
>>   	struct pci_dev *pdev = ctrl_dev(ctrl);
>>   	struct device *parent = pdev->dev.parent;
>> -	u16 status, events;
>> +	u16 status, events = 0;
>>   
>>   	/*
>>   	 * Interrupts only occur in D3hot or shallower and only if enabled 
>> @@ -554,6 +554,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>>   		}
>>   	}
>>   
>> +read_status:
>>   	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
>>   	if (status == (u16) ~0) {
>>   		ctrl_info(ctrl, "%s: no response from device\n", __func__); @@
>> -566,24 +567,37 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>>   	 * Slot Status contains plain status bits as well as event
>>   	 * notification bits; right now we only want the event bits.
>>   	 */
>> -	events = status & (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>> -			   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
>> -			   PCI_EXP_SLTSTA_DLLSC);
>> +	status &= PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>> +		  PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
>> +		  PCI_EXP_SLTSTA_DLLSC;
>>   
>>   	/*
>>   	 * If we've already reported a power fault, don't report it again
>>   	 * until we've done something to handle it.
>>   	 */
>>   	if (ctrl->power_fault_detected)
>> -		events &= ~PCI_EXP_SLTSTA_PFD;
>> +		status &= ~PCI_EXP_SLTSTA_PFD;
>>   
>> +	events |= status;
>>   	if (!events) {
>>   		if (parent)
>>   			pm_runtime_put(parent);
>>   		return IRQ_NONE;
>>   	}
>>   
>> -	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
>> +	if (status) {
>> +		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
>> +
>> +		/*
>> +		 * In MSI mode, all event bits must be zero before the port
>> +		 * will send a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4).
>> +		 * So re-read the Slot Status register in case a bit was set
>> +		 * between read and write.
>> +		 */
>> +		if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode)
>> +			goto read_status;
>> +	}
>> +
>>   	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
>>   	if (parent)
>>   		pm_runtime_put(parent);




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux