Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 20, 2018 at 05:36:03PM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 04:31:58PM -0600, Bjorn Helgaas wrote:
> > On Thu, Dec 20, 2018 at 04:26:54PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote:
> > > > On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> > > > > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> > > > > added probing of bridge support for 64 bit memory each time bridge is
> > > > > re-enumerated.
> > > > > 
> > > > > Unfortunately this probing is destructive if any device behind
> > > > > the bridge is in use at this time.
> > > > > 
> > > > > This was observed in the field, see
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > > > > and specifically
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> > > > > 
> > > > > There's no real need to re-probe the bridge features as the
> > > > > registers in question never change - detect that using
> > > > > the memory flag being set (it's always set on the 1st pass since
> > > > > all PCI2PCI bridges support memory forwarding) and skip the probing.
> > > > > Thus, only the first call will perform the disruptive probing and sets
> > > > > the resource flags as required - which we can be reasonably sure happens
> > > > > before any devices have been configured.
> > > > > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> > > > > Unfortunately I couldn't come up with a clean way to do it without a
> > > > > major probing code refactoring.
> > > > 
> > > > I'm OK with major probe code refactoring as long as it's done
> > > > carefully.  Doing a special-case fix like this solves the immediate
> > > > problem but adds to the long-term maintenance problem.
> > > > 
> > > > As far as I can tell, everything in pci_bridge_check_ranges() should
> > > > be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
> > > > path, and pci_bridge_check_ranges() itself should be removed.
> > > > 
> > > > If that turns out to be impossible for some reason, we need a comment
> > > > explaining why.
> > > 
> > > Maybe possible but I am not sure how.
> > > 
> > > Here's why:
> > > 
> > > Upon hotplug we want to poke at new bridges if any, but not the old
> > > ones.  The issue is that e.g. with ACPI hotplug the event that Linux
> > > knows how to handle is by design a heavy weight bus rescan.
> > 
> > Yeah, it's tricky.  But I don't think it's PCI or ACPI that makes this
> > tricky; I think it's just the historical baggage of the PCI core
> > design that makes it hard.
> > 
> > Even in the ACPI hotplug path, I think we use this pci_scan_device()
> > path:
> > 
> >   pci_scan_device
> >     pci_setup_device
> >       case PCI_HEADER_TYPE_NORMAL:
> >         pci_read_bases(6)             # normal PCI BARs
> >       case PCI_HEADER_TYPE_BRIDGE:
> >         pci_read_bases(2)             # bridge BARs (not windows)
> > 
> > Unfortunately that path doesn't call pci_read_bridge_bases() to read
> > the bridge windows; that currently happens in pcibios_fixup_bus(),
> > which is only called from pci_scan_child_bus_extend().
> > 
> > This is a broken design because reading the bridge apertures is not at
> > all platform-specific, so it shouldn't be done in a pcibios hook.
> > And, more to the issue at hand, it shouldn't be done in
> > pci_scan_child_bus() either.  We might have to *update* the windows
> > when scanning child buses, but we should be able to do the work of
> > finding out what windows are implemented and their properties
> > somewhere in the pci_setup_device() path.
> > 
> > > Specifically 
> > > 
> > > pci_read_bridge_bases does not
> > > seem to be called on ACPI hotplug path.
> > > 
> > > Rather,
> > > 
> > > pci_assign_unassigned_root_bus_resources
> > > pci_assign_unassigned_bridge_resources
> > > 
> > > would be the two functions in question.
> > > 
> > > 
> > > Would above explanation be sufficient? If not, since I understand your
> > > reluctance to pile up hacks, would you be open to doing the suggested
> > > rewrite yourself? Me and xuyandong can help test it.
> > 
> > I'll be on vacation or holiday most of the time until the new year,
> > but I put a reminder on my calendar to look at this again then.  I'm
> > pretty sure we've tried to unravel this in the past, but I can't
> > remember what issues we tripped over.  Maybe we can make some progress
> > by restricting the problem we're trying to solve.
> > 
> > Thanks for bringing this up!  This is a wart in the PCI core that has
> > bothered me for a long time, and maybe this is the incentive we need
> > to make some progress on it.
> > 
> > Bjorn
> 
> 
> Sounds good, let me know when I can help with testing.

Ping.

I guess I'll just have put this patch in the RHEL kernel for now.
No point in keeping kernels crashing until we can
make it perfect ...

> > > > > Reported-by: xuyandong <xuyandong2@xxxxxxxxxx>
> > > > > Tested-by: xuyandong <xuyandong2@xxxxxxxxxx>
> > > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > > Cc: Yinghai Lu <yinghai@xxxxxxxxxx>
> > > > > Cc: Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx>
> > > > > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> > > > > ---
> > > > > 
> > > > > Please review and consider for stable.
> > > > > 
> > > > > changes from v1:
> > > > >   comment and commit log updates to address comments by Bjorn.
> > > > > 
> > > > >  drivers/pci/setup-bus.c | 10 ++++++++++
> > > > >  1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > > > > index ed960436df5e..d5c25d465d97 100644
> > > > > --- a/drivers/pci/setup-bus.c
> > > > > +++ b/drivers/pci/setup-bus.c
> > > > > @@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> > > > >   struct resource *b_res;
> > > > >  
> > > > >   b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> > > > > +
> > > > > + /*
> > > > > +  * Don't re-check after this was called once already:
> > > > > +  * important since bridge might be in use.
> > > > > +  * Note: this is only reliable because as per spec all PCI to PCI
> > > > > +  * bridges support memory unconditionally so IORESOURCE_MEM is set.
> > > > > +  */
> > > > > + if (b_res[1].flags & IORESOURCE_MEM)
> > > > > +         return;
> > > > > +
> > > > >   b_res[1].flags |= IORESOURCE_MEM;
> > > > >  
> > > > >   pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > > > > -- 
> > > > > MST



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux