----- On 16 Aug, 2019, at 18:50, Sergey Miroshnichenko s.miroshnichenko@xxxxxxxxx wrote: > This is a yet another approach to fix an old [1-2] concurrency issue, when: > - two or more devices are being hot-added into a bridge which was > initially empty; > - a bridge with two or more devices is being hot-added; > - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges. > > The problem is that a bridge is reported as enabled before the MEM/IO bits > are actually written to the PCI_COMMAND register, so another driver thread > starts memory requests through the not-yet-enabled bridge: > > CPU0 CPU1 > > pci_enable_device_mem() pci_enable_device_mem() > pci_enable_bridge() pci_enable_bridge() > pci_is_enabled() > return false; > atomic_inc_return(enable_cnt) > Start actual enabling the bridge > ... pci_is_enabled() > ... return true; > ... Start memory requests <-- FAIL > ... > Set the PCI_COMMAND_MEMORY bit <-- Must wait for this > > Protect the pci_enable/disable_device() and pci_enable_bridge(), which is > similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race > while enabling upstream bridges"), but adding a per-device mutexes and > preventing the dev->enable_cnt from from incrementing early. > > CC: Srinath Mannam <srinath.mannam@xxxxxxxxxxxx> > CC: Marta Rybczynska <mrybczyn@xxxxxxxxx> > Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@xxxxxxxxx> > > [1] > https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@xxxxxxxxxxxx/T/#u > [RFC PATCH v3] pci: Concurrency issue during pci enable bridge > > [2] > https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@xxxxxxxxx/T/#u > [RFC PATCH] nvme: avoid race-conditions when enabling devices > --- > drivers/pci/pci.c | 26 ++++++++++++++++++++++---- > drivers/pci/probe.c | 1 + > include/linux/pci.h | 1 + > 3 files changed, 24 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 1b27b5af3d55..e7f8c354e644 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev) > struct pci_dev *bridge; > int retval; > > + mutex_lock(&dev->enable_mutex); > + > bridge = pci_upstream_bridge(dev); > if (bridge) > pci_enable_bridge(bridge); > @@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev) > if (pci_is_enabled(dev)) { > if (!dev->is_busmaster) > pci_set_master(dev); > + mutex_unlock(&dev->enable_mutex); > return; > } > This code is used by numerous drivers and when we've seen that issue I was wondering if there are some use-cases when this (or pci_disable_device) is called with interrupts disabled. It seems that it shouldn't be, but a BUG_ON or error when someone calls it this way would be helpful when debugging. Marta