> -----Original Message----- > From: Dexuan Cui <decui@xxxxxxxxxxxxx> > Sent: Thursday, April 18, 2024 9:53 PM > To: bhelgaas@xxxxxxxxxx; wei.liu@xxxxxxxxxx; KY Srinivasan > <kys@xxxxxxxxxxxxx>; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; > lpieralisi@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx > Cc: linux-hyperv@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Boqun > Feng <Boqun.Feng@xxxxxxxxxxxxx>; Sunil Muthuswamy > <sunilmut@xxxxxxxxxxxxx>; Saurabh Singh Sengar <ssengar@xxxxxxxxxxxxx>; > Dexuan Cui <decui@xxxxxxxxxxxxx> > Subject: [PATCH] PCI: Add a mutex to protect the global list > pci_domain_busn_res_list > > There has been an effort to make the pci-hyperv driver support > async-probing to reduce the boot time. With async-probing, multiple > kernel threads can be running hv_pci_probe() -> create_root_hv_pci_bus() > -> > pci_scan_root_bus_bridge() -> pci_bus_insert_busn_res() at the same time > to > update the global list, causing list corruption. > > Add a mutex to protect the list. > > Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> > --- > drivers/pci/probe.c | 25 ++++++++++++++++++------- > 1 file changed, 18 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index e19b79821dd6..1327fd820b24 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -37,6 +37,7 @@ LIST_HEAD(pci_root_buses); > EXPORT_SYMBOL(pci_root_buses); > > static LIST_HEAD(pci_domain_busn_res_list); > +static DEFINE_MUTEX(pci_domain_busn_res_list_lock); > > struct pci_domain_busn_res { > struct list_head list; > @@ -47,14 +48,22 @@ struct pci_domain_busn_res { > static struct resource *get_pci_domain_busn_res(int domain_nr) > { > struct pci_domain_busn_res *r; > + struct resource *ret; > > - list_for_each_entry(r, &pci_domain_busn_res_list, list) > - if (r->domain_nr == domain_nr) > - return &r->res; > + mutex_lock(&pci_domain_busn_res_list_lock); > + > + list_for_each_entry(r, &pci_domain_busn_res_list, list) { > + if (r->domain_nr == domain_nr) { > + ret = &r->res; > + goto out; > + } > + } > > r = kzalloc(sizeof(*r), GFP_KERNEL); > - if (!r) > - return NULL; > + if (!r) { > + ret = NULL; > + goto out; > + } > > r->domain_nr = domain_nr; > r->res.start = 0; > @@ -62,8 +71,10 @@ static struct resource *get_pci_domain_busn_res(int > domain_nr) > r->res.flags = IORESOURCE_BUS | IORESOURCE_PCI_FIXED; > > list_add_tail(&r->list, &pci_domain_busn_res_list); > - > - return &r->res; > + ret = &r->res; > +out: > + mutex_unlock(&pci_domain_busn_res_list_lock); > + return ret; > } The patch is for common pci code. So, this bug has been there for a while? Do you have a sample stack trace of the crash? I checked pci-hyperv, it doesn't define the .driver.probe_type, so PROBE_DEFAULT_STRATEGY is in effect. driver_allows_async_probing() returns false unless kernel/mod param requests async. So async probing haven't been practiced here. If in the future, we change the pci-hyperv's probe_type to PROBE_PREFER_ASYNCHRONOUS, how does it affect the underlying PCI device's probes within the same device type? For example, MANA driver doesn't set probe_type. Will pci-hyperv's async probing cause async probing or potentially nondeterministic naming for MANA devices? Thanks, - Haiyang