On 3/26/19 10:00 PM, Bjorn Helgaas wrote: > [+cc Srinath, Marta, LKML] > > On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote: >> CPU0 CPU1 >> >> pci_enable_device_mem() pci_enable_device_mem() >> pci_enable_bridge() pci_enable_bridge() >> pci_is_enabled() >> return false; >> atomic_inc_return(enable_cnt) >> Start actual enabling the bridge >> ... pci_is_enabled() >> ... return true; >> ... Start memory requests <-- FAIL >> ... >> Set the PCI_COMMAND_MEMORY bit <-- Must wait for this >> >> This patch protects the pci_enable/disable_device() and pci_enable_bridge() >> with mutexes. > > This is a subtle issue that we've tried to fix before, but we've never > had a satisfactory solution, so I hope you've figured out the right > fix. > > I'll include some links to previous discussion. This patch is very > similar to [2], which we didn't actually apply. We did apply the > patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling > upstream bridges"), but it caused the regressions reported in [4,5], > so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while > enabling upstream bridges""). > Thanks for the links, I wasn't aware of these discussions and patches! On PowerNV this issue is partially hidden by db2173198b95 ("powerpc/powernv/pci: Work around races in PCI bridge enabling"), and on x86 BIOS pre-initializes all the bridges, so it doesn't reproduce until hotplugging in a hotplugged bridge. This patch is indeed similar to 40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges"), but instead of a single static mutex it adds per-device mutexes and prevents the dev->enable_cnt from incrementing too early. So it's not needed anymore to carefully select a moment safe enough to enable the device. Serge > I think the underlying design problem is that we have a driver for > device B calling pci_enable_device(), and it is changing the state of > device A (an upstream bridge). The model generally is that a driver > should only touch the device it is bound to. > > It's tricky to get the locking right when several children of device A > all need to operate on A. > > That's all to say I'll have to think carefully about this particular > patch, so I'll go on to the others and come back to this one. > > Bjorn > > [1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@xxxxxxxxxxxx/T/#u > [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch > > [2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@xxxxxxxxxxxx/T/#u > [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch > > [3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@xxxxxxxxxxxx/T/#u > [RFC PATCH v3] pci: Concurrency issue during pci enable bridge > > [4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u > [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently > > [5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@xxxxxxxxx/T/#u > iwlwifi firmware load broken in current -git > > [6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@xxxxxxxxx/T/#u > [RFC PATCH] nvme: avoid race-conditions when enabling devices > >> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@xxxxxxxxx> >> --- >> drivers/pci/pci.c | 26 ++++++++++++++++++++++---- >> drivers/pci/probe.c | 1 + >> include/linux/pci.h | 1 + >> 3 files changed, 24 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index f006068be209..895201d4c9e6 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev) >> struct pci_dev *bridge; >> int retval; >> >> + mutex_lock(&dev->enable_mutex); >> + >> bridge = pci_upstream_bridge(dev); >> if (bridge) >> pci_enable_bridge(bridge); >> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev) >> if (pci_is_enabled(dev)) { >> if (!dev->is_busmaster) >> pci_set_master(dev); >> + mutex_unlock(&dev->enable_mutex); >> return; >> } >> >> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev) >> pci_err(dev, "Error enabling bridge (%d), continuing\n", >> retval); >> pci_set_master(dev); >> + mutex_unlock(&dev->enable_mutex); >> } >> >> static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) >> { >> struct pci_dev *bridge; >> + /* Enable-locking of bridges is performed within the pci_enable_bridge() */ >> + bool need_lock = !dev->subordinate; >> int err; >> int i, bars = 0; >> >> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) >> dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); >> } >> >> - if (atomic_inc_return(&dev->enable_cnt) > 1) >> + if (need_lock) >> + mutex_lock(&dev->enable_mutex); >> + if (pci_is_enabled(dev)) { >> + if (need_lock) >> + mutex_unlock(&dev->enable_mutex); >> return 0; /* already enabled */ >> + } >> >> bridge = pci_upstream_bridge(dev); >> if (bridge) >> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) >> bars |= (1 << i); >> >> err = do_pci_enable_device(dev, bars); >> - if (err < 0) >> - atomic_dec(&dev->enable_cnt); >> + if (err >= 0) >> + atomic_inc(&dev->enable_cnt); >> + if (need_lock) >> + mutex_unlock(&dev->enable_mutex); >> return err; >> } >> >> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev) >> if (dr) >> dr->enabled = 0; >> >> + mutex_lock(&dev->enable_mutex); >> dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0, >> "disabling already-disabled device"); >> >> - if (atomic_dec_return(&dev->enable_cnt) != 0) >> + if (atomic_dec_return(&dev->enable_cnt) != 0) { >> + mutex_unlock(&dev->enable_mutex); >> return; >> + } >> >> do_pci_disable_device(dev); >> >> dev->is_busmaster = 0; >> + >> + mutex_unlock(&dev->enable_mutex); >> } >> EXPORT_SYMBOL(pci_disable_device); >> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index 2ec0df04e0dc..977a127ce791 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus) >> INIT_LIST_HEAD(&dev->bus_list); >> dev->dev.type = &pci_dev_type; >> dev->bus = pci_bus_get(bus); >> + mutex_init(&dev->enable_mutex); >> >> return dev; >> } >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 77448215ef5b..cb2760a31fe2 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -419,6 +419,7 @@ struct pci_dev { >> unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ >> pci_dev_flags_t dev_flags; >> atomic_t enable_cnt; /* pci_enable_device has been called */ >> + struct mutex enable_mutex; >> >> u32 saved_config_space[16]; /* Config space saved at suspend time */ >> struct hlist_head saved_cap_space; >> -- >> 2.20.1 >>
Attachment:
signature.asc
Description: OpenPGP digital signature