Re: [PATCH] PCI/ASPM: Don't remove pcie_link_state until we stop the last device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 31, 2015 at 09:24:56AM +0800, wangyijing wrote:
> 在 2015/8/29 20:20, Bjorn Helgaas 写道:
> > On Thu, Jul 30, 2015 at 12:09:20PM +0800, Yijing Wang wrote:
> >> Now we stop the pci_bus->devices in reverse order, but in
> >> pcie_aspm_exit_link_state(), we only would do something when
> >> the device is the last one.
> >>
> >> void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> >> {
> >> 	...
> >> 	if (!list_is_last(&pdev->bus_list, &parent->subordinate->devices))
> ...

> > What seems wrong to me is that when we're removing device X, we free the
> > link_state for a *parent* of X.  I think the code would be much simpler and
> 
> What I am worried about here is if we hot remove a endpoint device here, and leave
> the parent device, so we don't call pci_aspm_exit_link_state() for this link anymore ?
> 
>                   pcie link
> downstream port ---------- endpoint device

If we remove the endpoint and leave the parent, we do call
pcie_aspm_exit_link_state() for the endpoint.  I'm proposing to change
that path so that when the endpoint is removed, we do whatever is
necessary for changing the ASPM configuration on the link, but leave
the ASPM structure allocated, since the parent still exists.

Today we allocate link_state only when a port has a downstream link
*and* there's a device on the other end of the link.  I'm proposing
that we allocate link_state even if there's no downstream device.

Then the alloc/free path is always tied directly to the parent.  We
can still change ASPM configuration as needed when downstream devices
are added or removed.  It's not a trivial change, but it seems
possible.

> > easier to get right if we freed the link_state for X when we remove X.
> > 
> > Can you look at fixing the problem that way?
> 
> I'm sorry, I don't have the platform now, this issue was found in product department, and they moved
> the platform away.

OK.  I'll drop this patch for now.  I definitely agree this is a
problem, but the fact that you don't have the platform any more
doesn't mean we need to throw in a band-aid instead of designing a
better solution.

This problem should be pretty easy to reproduce on any platform,
possibly with a little bit of scaffolding to tweak the topology.

> >> 		goto out;
> >> 	...
> >> }
> >>
> >> So if we have the following pcie tree, system may crash.
> >>
> >> [b7-bd]--+-02.0-[bb-bd]--+-00.0-[bc-bd]----01.0-[bd]----00.0  PLX Technology, Inc. Device 0002
> >>                          +-00.1  PLX Technology, Inc. Device 0002
> >>                          +-00.2  PLX Technology, Inc. Device 0002
> >>                          +-00.3  PLX Technology, Inc. Device 0002
> ...
> >> Signed-off-by: Yijing Wang <wangyijing@xxxxxxxxxx>
> >> CC: stable@xxxxxxxxxxxxxxx #3.4+
> > 
> > I need a clue about why you picked v3.4 here.  Is it because ac205b7bb72f
> > ("PCI: make sriov work with hotplug remove") appeared in v3.4?
> 
> Actually, this issue was found at v3.4 stable kernel, which was introduced in
> 3419c75e15f8 ("PCI: properly clean up ASPM link state on device remove") I think.

Did you bisect to this or figure it out by analysis?  It's good to
mention the actual commit that broke it so people backporting can
figure out if they need the fix.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux