Apologies for the delay in reply. Please let me know if I should resend
the patch.
Responses inline.
On 11/2/2022 4:21 PM, Bjorn Helgaas wrote:
On Tue, Nov 01, 2022 at 12:07:18AM +0000, Smita Koralahalli wrote:
Current systems support Firmware-First model for hot-plug. In this model,
I'm familiar with "firmware first" in the context of ACPI APEI.
Is there more "firmware first" language in the spec related to
hotplug? Or is this just the ACPI hotplug implemented by acpiphp? Or
is there something in the PCIe spec that talks about some firmware
interfaces needed in pciehp? If so, please cite the specific
sections. I see you cite PCIe r6.0, sec 6.7.6, below, but I don't see
the firmware mention there.
Firmware-first refers to AER/DPC firmware handling support. When FW is
in full
control of AER/DPC.. The term "FW-First Hotplug/OS-First Hotplug" might look
confusing here as the terms don't exist in Spec. Will rephrase them in
next revisions.
In simple words, this patch follows the sequencing actions of a hot
remove when
DPC is enabled and HPS is suppressed and fixes the side effects of
remove when
OS is in full control of AER/DPC.
Other relevant reference is in PCI Firmware Specification, Revision 3.3,
"4.6.12.
_DSM for Downstream Port Containment and Hot-Plug Surprise Control", The
PCIe spec allows for this flow: “When the operating system controls DPC,
this
section describes how the operating system can request the firmware to
suppress
Hot-Plug Surprise for a given DPC capable root port or a switch port.."
.. The operating system must evaluate this _DSM function when enabling or
disabling DPC regardless of whether the operating system or system firmware
owns DPC. If the operating system owns DPC then evaluating this _DSM
function
lets the system firmware know when the operating system is ready to
handle DPC
events and gives the system firmware an opportunity to clear the
Hot-Plug Surprise
bit, if applicable.
firmware holds the responsibilty for executing the HW sequencing actions on
an async or surprise add and removal events. Additionally, according to
Section 6.7.6 of PCIe Base Specification [1], firmware must also handle
the side-effects (DPC/AER events) reported on an async removal and is
abstract to the OS.
This model however, poses issues while rolling out updates or fixing bugs
as the servers need to be brought down for firmware updates. Hence,
introduce support for OS-First hot-plug and AER/DPC. Here, OS is
responsible for handling async add and remove along with handling of
AER/DPC events which are generated as a side-effect of async remove.
The implementation is as follows: On an async remove a DPC is triggered as
a side-effect along with an MSI to the OS. Determine it's an async remove
by checking for DPC Trigger Status in DPC Status Register and Surprise
Down Error Status in AER Uncorrected Error Status to be non-zero. If true,
treat the DPC event as a side-effect of async remove, clear the error
status registers and continue with hot-plug tear down routines. If not,
follow the existing routine to handle AER/DPC errors.
Dmesg before:
pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000
pcieport 0000:00:01.4: [ 5] SDES (First)
nvme nvme2: frozen state error detected, reset controller
pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
pcieport 0000:00:01.4: AER: subordinate device reset failed
pcieport 0000:00:01.4: AER: device recovery failed
pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
nvme2n1: detected capacity change from 1953525168 to 0
pci 0000:04:00.0: Removing from iommu group 49
Dmesg after:
pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
nvme1n1: detected capacity change from 1953525168 to 0
pci 0000:04:00.0: Removing from iommu group 37
pcieport 0000:00:01.4: pciehp: Slot(16): Card present
pci 0000:04:00.0: [8086:0a54] type 00 class 0x010802
pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
pci 0000:04:00.0: Max Payload Size set to 512 (was 128, max 512)
pci 0000:04:00.0: enabling Extended Tags
pci 0000:04:00.0: Adding to iommu group 37
pci 0000:04:00.0: BAR 0: assigned [mem 0xf2400000-0xf2403fff 64bit]
pcieport 0000:00:01.4: PCI bridge to [bus 04]
pcieport 0000:00:01.4: bridge window [io 0x1000-0x1fff]
pcieport 0000:00:01.4: bridge window [mem 0xf2400000-0xf24fffff]
pcieport 0000:00:01.4: bridge window [mem 0x20080800000-0x200809fffff 64bit pref]
nvme nvme1: pci function 0000:04:00.0
nvme 0000:04:00.0: enabling device (0000 -> 0002)
nvme nvme1: 128/0/0 default/read/poll queues
Remove any lines that are not specifically relevant, e.g., I'm not
sure whether the BARs, iommu, MPS, extended tags info is essential.
Please indent the quoted material two spaces so it doesn't look like
the narrative text.
Thanks for working on this!
Bjorn
Will do in v2. Thanks for reviewing this.
Smita