Hi,
On 1/29/25 1:29 PM, Bjorn Helgaas wrote:
On Sun, Dec 22, 2024 at 07:39:08PM -0800, Daniel Stodden wrote:
From: Daniel Stodden <daniel.stodden@xxxxxxxxx>
If possible include where you noticed this issue and any dmesg
logs related to it.
Before change 456d8aa37d0f "Disable ASPM on MFD function removal to
avoid use-after-free", we would free the ASPM link only after the last
function on the bus pertaining to the given link was removed.
That was too late. If function 0 is removed before sibling function,
link->downstream would point to free'd memory after.
After above change, we freed the ASPM parent link state upon any
function removal on the bus pertaining to a given link.
That is too early. If the link is to a PCIe switch with MFD on the
upstream port, then removing functions other than 0 first would free a
link which still remains parent_link to the remaining downstream
ports.
Is this specific to a Switch? It seems like removal of any
multi-function device might trip over this.
The resulting GPFs are especially frequent during hot-unplug, because
pciehp removes devices on the link bus in reverse order.
Do you have a sample GPF? If we can include a few pertinent lines
here it may help people connect a problem with this fix.
On that switch, function 0 is the virtual P2P bridge to the internal
bus. Free exactly when function 0 is removed -- before the parent link
is obsolete, but after all subordinate links are gone.
I agree this is a problem.
IIUC we allocate pcie_link_state when enumerating a device on the
upstream end of a link, i.e., a Root Port or Switch Downstream Port,
but we deallocate it when removing a device on the downstream end of
the link, i.e., an Endpoint or Switch Upstream Port. This asymmetry
seems kind of prone to error.
Also, struct pcie_link_state contains redundant information, e.g., the
pdev, downstream, parent, and sibling members basically duplicate the
hierarchy already described by the struct pci_bus parent, self, and
devices members. Redundancy like this is also error prone.
This patch is attractive because it's a very small fix, and maybe we
should use it for that reason. But I do think we're basically
papering over a pretty serious design defect in ASPM.
I think we'd ultimately be better off if we allocated pcie_link_state
either as a member of struct pci_dev (instead of using a pointer), or
perhaps in pci_setup_device() when we set up the rest of the
bridge-related things and made it live as long as the bridge device.
Actually, if we removed all the redundant pointers in struct
pcie_link_state, it would be smaller than a single pointer, so there'd
be no reason to allocate it dynamically.
Of course this would be a much bigger change to aspm.c.
Agree. I also think creating the link at the setup would be a better
approach.
But If this issue is seen in any real hardware now and need a urgent
fix, may be we can merge this change for now.
Fixes: 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
Do we have any public problem reports we could mention here? I'm
actually a little surprised that this hasn't been a bigger problem,
given that 456d8aa37d0f appeared in v6.5 in Aug 2023. But maybe there
is some unusual topology or hot-unplug involved?
Signed-off-by: Daniel Stodden <dns@xxxxxxxxxx>
---
drivers/pci/pcie/aspm.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index e0bc90597dca..8ae7c75b408c 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1273,16 +1273,16 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
parent_link = link->parent;
/*
- * link->downstream is a pointer to the pci_dev of function 0. If
- * we remove that function, the pci_dev is about to be deallocated,
- * so we can't use link->downstream again. Free the link state to
- * avoid this.
+ * Free the parent link state, no later than function 0 (i.e.
+ * link->downstream) being removed.
*
- * If we're removing a non-0 function, it's possible we could
- * retain the link state, but PCIe r6.0, sec 7.5.3.7, recommends
- * programming the same ASPM Control value for all functions of
- * multi-function devices, so disable ASPM for all of them.
+ * Do not free free the link state any earlier. If function 0
Above line is not very clear. May be you can remove " Do not free free
the link state any earlier"
+ * is a switch upstream port, this link state is parent_link
+ * to all subordinate ones.
*/
+ if (pdev != link->downstream)
+ goto out;
+
pcie_config_aspm_link(link, 0);
list_del(&link->sibling);
free_link_state(link);
@@ -1293,6 +1293,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
pcie_config_aspm_path(parent_link);
}
+ out:
mutex_unlock(&aspm_lock);
up_read(&pci_bus_sem);
}
--
2.47.0
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer