On Thu, Dec 20, 2018 at 04:26:54PM -0500, Michael S. Tsirkin wrote: > On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote: > > On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote: > > > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit") > > > added probing of bridge support for 64 bit memory each time bridge is > > > re-enumerated. > > > > > > Unfortunately this probing is destructive if any device behind > > > the bridge is in use at this time. > > > > > > This was observed in the field, see > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html > > > and specifically > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html > > > > > > There's no real need to re-probe the bridge features as the > > > registers in question never change - detect that using > > > the memory flag being set (it's always set on the 1st pass since > > > all PCI2PCI bridges support memory forwarding) and skip the probing. > > > Thus, only the first call will perform the disruptive probing and sets > > > the resource flags as required - which we can be reasonably sure happens > > > before any devices have been configured. > > > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer. > > > Unfortunately I couldn't come up with a clean way to do it without a > > > major probing code refactoring. > > > > I'm OK with major probe code refactoring as long as it's done > > carefully. Doing a special-case fix like this solves the immediate > > problem but adds to the long-term maintenance problem. > > > > As far as I can tell, everything in pci_bridge_check_ranges() should > > be done once at enumeration-time, e.g., in the pci_read_bridge_bases() > > path, and pci_bridge_check_ranges() itself should be removed. > > > > If that turns out to be impossible for some reason, we need a comment > > explaining why. > > Maybe possible but I am not sure how. > > Here's why: > > Upon hotplug we want to poke at new bridges if any, but not the old > ones. The issue is that e.g. with ACPI hotplug the event that Linux > knows how to handle is by design a heavy weight bus rescan. Yeah, it's tricky. But I don't think it's PCI or ACPI that makes this tricky; I think it's just the historical baggage of the PCI core design that makes it hard. Even in the ACPI hotplug path, I think we use this pci_scan_device() path: pci_scan_device pci_setup_device case PCI_HEADER_TYPE_NORMAL: pci_read_bases(6) # normal PCI BARs case PCI_HEADER_TYPE_BRIDGE: pci_read_bases(2) # bridge BARs (not windows) Unfortunately that path doesn't call pci_read_bridge_bases() to read the bridge windows; that currently happens in pcibios_fixup_bus(), which is only called from pci_scan_child_bus_extend(). This is a broken design because reading the bridge apertures is not at all platform-specific, so it shouldn't be done in a pcibios hook. And, more to the issue at hand, it shouldn't be done in pci_scan_child_bus() either. We might have to *update* the windows when scanning child buses, but we should be able to do the work of finding out what windows are implemented and their properties somewhere in the pci_setup_device() path. > Specifically > > pci_read_bridge_bases does not > seem to be called on ACPI hotplug path. > > Rather, > > pci_assign_unassigned_root_bus_resources > pci_assign_unassigned_bridge_resources > > would be the two functions in question. > > > Would above explanation be sufficient? If not, since I understand your > reluctance to pile up hacks, would you be open to doing the suggested > rewrite yourself? Me and xuyandong can help test it. I'll be on vacation or holiday most of the time until the new year, but I put a reminder on my calendar to look at this again then. I'm pretty sure we've tried to unravel this in the past, but I can't remember what issues we tripped over. Maybe we can make some progress by restricting the problem we're trying to solve. Thanks for bringing this up! This is a wart in the PCI core that has bothered me for a long time, and maybe this is the incentive we need to make some progress on it. Bjorn > > > Reported-by: xuyandong <xuyandong2@xxxxxxxxxx> > > > Tested-by: xuyandong <xuyandong2@xxxxxxxxxx> > > > Cc: stable@xxxxxxxxxxxxxxx > > > Cc: Yinghai Lu <yinghai@xxxxxxxxxx> > > > Cc: Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx> > > > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > > > --- > > > > > > Please review and consider for stable. > > > > > > changes from v1: > > > comment and commit log updates to address comments by Bjorn. > > > > > > drivers/pci/setup-bus.c | 10 ++++++++++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > > > index ed960436df5e..d5c25d465d97 100644 > > > --- a/drivers/pci/setup-bus.c > > > +++ b/drivers/pci/setup-bus.c > > > @@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus) > > > struct resource *b_res; > > > > > > b_res = &bridge->resource[PCI_BRIDGE_RESOURCES]; > > > + > > > + /* > > > + * Don't re-check after this was called once already: > > > + * important since bridge might be in use. > > > + * Note: this is only reliable because as per spec all PCI to PCI > > > + * bridges support memory unconditionally so IORESOURCE_MEM is set. > > > + */ > > > + if (b_res[1].flags & IORESOURCE_MEM) > > > + return; > > > + > > > b_res[1].flags |= IORESOURCE_MEM; > > > > > > pci_read_config_word(bridge, PCI_IO_BASE, &io); > > > -- > > > MST