On Wed, Oct 25, 2017 at 02:29:40PM +0300, Mika Westerberg wrote: > During surprise hot-unplug the device is not accessible anymore and > register reads return 0xffffffff. When that happens pciehp_unconfigure_device() > may inadvertently think the device below the bridge may be a display > device of somesort as reading PCI_BRIDGE_CONTROL register also returns > 0xff. This results failure of the remove operation: > > pciehp 0000:00:1c.0:pcie004: Slot(0): Link Down > pciehp 0000:00:1c.0:pcie004: Slot(0): Card present > pciehp 0000:00:1c.0:pcie004: Cannot remove display device 0000:01:00.0 > > Because of this the hierarchy is left untouched preventing further > hotplug operations. > > Now, it is not clear why the check is there in the first place and why > we would like to prevent removing a bridge if it has PCI_BRIDGE_CTL_VGA > set. In case of surprise hot-unplug, it would not even be possible to > prevent the removal. It may be due to the fact that pciehp_pci.c pretty > much copies similar implementation from shpchp_pci.c and this check was > just left there in the code without further thinking if it is actually > needed at all. > > Given this and the issue described above, I think it makes sense to drop > the whole PCI_BRIDGE_CONTROL check from pciehp_unconfigure_device() > which is what this patch does. > > Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> > --- > The previous version of the patch can be found here: > > https://patchwork.kernel.org/patch/10024145/ > > Changes from the previous version: > > * Drop the whole PCI_BRIDGE_CONTROL check > * Update patch subject to reflect that > > drivers/pci/hotplug/pciehp_pci.c | 12 ------------ > 1 file changed, 12 deletions(-) > > diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c > index 2a1ca020cf5a..acc360d1a1fb 100644 > --- a/drivers/pci/hotplug/pciehp_pci.c > +++ b/drivers/pci/hotplug/pciehp_pci.c > @@ -79,7 +79,6 @@ int pciehp_configure_device(struct slot *p_slot) > int pciehp_unconfigure_device(struct slot *p_slot) > { > int rc = 0; > - u8 bctl = 0; > u8 presence = 0; > struct pci_dev *dev, *temp; > struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate; > @@ -101,17 +100,6 @@ int pciehp_unconfigure_device(struct slot *p_slot) > list_for_each_entry_safe_reverse(dev, temp, &parent->devices, > bus_list) { > pci_dev_get(dev); > - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) { > - pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl); > - if (bctl & PCI_BRIDGE_CTL_VGA) { > - ctrl_err(ctrl, > - "Cannot remove display device %s\n", > - pci_name(dev)); > - pci_dev_put(dev); > - rc = -EINVAL; > - break; > - } > - } > if (!presence) { > pci_dev_set_disconnected(dev, NULL); > if (pci_has_subordinate(dev)) As you mention, shpchp_unconfigure_device() contains the same code; is there any reason not to remove it from there as well? I know you probably can't test shpchp, and neither can I, but it looks like it has the same issue, and I don't want to avoid fixing it there just for want of testing. I wish we knew more about why that PCI_BRIDGE_CTL_VGA check was there in the first place. I think it was added by Dely Sy in 2004 [1]. LinkedIn thinks he's still at Intel; any chance you could ping him and see if he has any insight? My guess is that it was originally in a path that didn't have to worry about surprise unplug. But I still don't know why it would be a problem. Bjorn [1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=c16b4b14d980