Re: [PATCH v2 1/2] PCI: pciehp: Add support for async hotplug with native AER and DPC/EDR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/16/2023 10:31 AM, Lukas Wunner wrote:
On Mon, May 22, 2023 at 03:23:57PM -0700, Smita Koralahalli wrote:
On 5/16/2023 3:10 AM, Lukas Wunner wrote:
On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote:


I'd recommend clearing only PCI_EXP_DEVSTA_FED in PCI_EXP_DEVSTA.

As for PCI_EXP_DPC_RP_PIO_STATUS, PCIe r6.1 sec 2.9.3 says that
during DPC, either UR or CA completions are returned depending on
the DPC Completion Control bit in the DPC Control register.
The kernel doesn't touch that bit, so it will contain whatever value
the BIOS has set. It seems fine to me to just clear all bits in
PCI_EXP_DPC_RP_PIO_STATUS, as you've done in your patch.

However, the RP PIO Status register is present only in Root Ports
that support RP Extensions for DPC, per PCIe r6.1 sec 7.9.14.6.
So you need to constrain that to "if (pdev->dpc_rp_extensions)".


Okay will make changes.


+	pci_aer_raw_clear_status(pdev);
+	pci_clear_surpdn_errors(pdev);
+
+	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
+			      PCI_EXP_DPC_STATUS_TRIGGER);
+}

Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end
of the function to wake up a pciehp handler waiting for DPC recovery?

I don't think so. The pciehp handler is however getting invoked
simultaneously due to PDSC or DLLSC state change right.. Let me know if I'm
missing anything here.

I think you need to follow the procedure in dpc_reset_link().

That function first waits for the link to go down, in accordance with
PCIe r6.1 sec 6.2.11:

	if (!pcie_wait_for_link(pdev, false))
	...

Note that the link should not come back up due to a newly hot-added
device until DPC Trigger Status is cleared.

The function then waits for the Root Port to quiesce:

	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
	...

And only then does the function clear DPC Trigger Status.

You definitely need to wake_up_all(&dpc_completed_waitqueue) because
pciehp may be waiting for DPC Trigger Status to clear.

And you need to "clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags)"
before calling wake_up_all().



Okay. I did not consider the fact that pciehp handler "may" wait on DPC
Trigger Status to be cleared. Because in my case both the handlers were
invoked due to their respective bit changes and I did not come across the case where pciehp handler was waiting on DPC to complete.


+static bool dpc_is_surprise_removal(struct pci_dev *pdev)
+{
+	u16 status;
+
+	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
+
+	if (!(status & PCI_ERR_UNC_SURPDN))
+		return false;
+

You need an additional check for pdev->is_hotplug_bridge here.

And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS.

Return false if either of them isn't set.

Return false, if PCI_EXP_SLTCAP isn't set only correct? PCI_EXP_SLTCAP_HPS
should be disabled if DPC is enabled.

Implementation notes in 6.7.6 says that:
"The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug
Surprise bit in the Slot Capabilities Register being Set, is deprecated
for use with async hot-plug. DPC is the recommended mechanism for supporting
async hot-plug."

Platform FW will disable the SLTCAP_HPS bit at boot time to enable async
hotplug on AMD devices.

Huh, is PCI_EXP_SLTCAP_HPS not set on the hotplug port in question?

If it's not set, why do you get Surprise Down Errors in the first place?

How do you bring down the slot without surprise-removal capability?
Via sysfs?


As per SPEC 6.7.6, "Either Downstream Port Containment (DPC) or the Hot-Plug Surprise (HPS) mechanism may be used to support async removal as part of an overall async hot-plug architecture".

Also, the implementation notes below, it conveys that HPS is deprecated and DPC is recommended mechanism. More details can be found in Appendix I, I.1 Async Hot-Plug Initial Configuration:
...
If DPC capability then,
	If HPS bit not Set, use DPC
	Else attempt to Clear HPS bit (§ Section 6.7.4.4 )
		If successful, use DPC
		Else use HPS
...

So, this is "likely" a new feature support patch where DPC supports async remove. HPS bit will be disabled by BIOS if DPC is chosen as recommended mechanism to handle async removal.

I see the slot is being brought down by PDC or DLLSC event, which is triggered alongside DPC.

pciehp_handle_presence_or_link_change() -> pciehp_disable_slot() -> __pciehp_disable_slot() -> remove_board()..

But I want to clear one thing, are you implying that PDC or DLLSC shouldn't be triggered when HPS is disabled?

Thanks,
Smita


Probably check if SLTCAP_HPS bit is set and return false?

Quite the opposite!  If it's not set, return false.


Thanks,

Lukas




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux