On Sat, Nov 16, 2013 at 01:53:56PM +0900, Tejun Heo wrote: > Hello, Bjorn. > > On Fri, Nov 15, 2013 at 05:28:20PM -0700, Bjorn Helgaas wrote: > > It would be better to fix PCI so we don't call VF driver .probe() methods > > from inside a PF driver .probe() method, but that's a bigger project. > > Yeah, if pci doesn't need the recursion, we can simply revert restore > the lockdep annoation on work_on_cpu(). > > > @@ -293,7 +293,9 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, > > its local memory on the right node without any need to > > change it. */ > > node = dev_to_node(&dev->dev); > > - if (node >= 0) { > > + preempt_disable(); > > + > > + if (node >= 0 && node != numa_node_id()) { > > A bit of comment here would be nice but yeah I think this should work. > Can you please also queue the revert of c2fda509667b ("workqueue: > allow work_on_cpu() to be called recursively") after this patch? > Please feel free to add my acked-by. OK, below are the two patches (Alex's fix + the revert) I propose to merge. Unless there are objections, I'll ask Linus to pull these before v3.13-rc1. Bjorn commit 84f23f99b507c2c9247f47d3db0f71a3fd65e3a3 Author: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> Date: Mon Nov 18 10:59:59 2013 -0700 PCI: Avoid unnecessary CPU switch when calling driver .probe() method If we are already on a CPU local to the device, call the driver .probe() method directly without using work_on_cpu(). This is a workaround for a lockdep warning in the following scenario: pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) driver .probe pci_enable_sriov ... pci_bus_add_device ... pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) It would be better to fix PCI so we don't call VF driver .probe() methods from inside a PF driver .probe() method, but that's a bigger project. [bhelgaas: disable preemption, open bugzilla, rework comments & changelog] Link: https://bugzilla.kernel.org/show_bug.cgi?id=65071 Link: http://lkml.kernel.org/r/CAE9FiQXYQEAZ=0sG6+2OdffBqfLS9MpoN1xviRR9aDbxPxcKxQ@xxxxxxxxxxxxxx Link: http://lkml.kernel.org/r/20130624195942.40795.27292.stgit@xxxxxxxxxxxxxxxxxxxxxxxx Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Acked-by: Tejun Heo <tj@xxxxxxxxxx> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 9042fdbd7244..add04e70ac2a 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -288,12 +288,24 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, int error, node; struct drv_dev_and_id ddi = { drv, dev, id }; - /* Execute driver initialization on node where the device's - bus is attached to. This way the driver likely allocates - its local memory on the right node without any need to - change it. */ + /* + * Execute driver initialization on node where the device is + * attached. This way the driver likely allocates its local memory + * on the right node. + */ node = dev_to_node(&dev->dev); - if (node >= 0) { + preempt_disable(); + + /* + * On NUMA systems, we are likely to call a PF probe function using + * work_on_cpu(). If that probe calls pci_enable_sriov() (which + * adds the VF devices via pci_bus_add_device()), we may re-enter + * this function to call the VF probe function. Calling + * work_on_cpu() again will cause a lockdep warning. Since VFs are + * always on the same node as the PF, we can work around this by + * avoiding work_on_cpu() when we're already on the correct node. + */ + if (node >= 0 && node != numa_node_id()) { int cpu; get_online_cpus(); @@ -305,6 +317,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, put_online_cpus(); } else error = local_pci_probe(&ddi); + + preempt_enable(); return error; } commit 2dde5285d06370b2004613ee4fd253e95622af20 Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Date: Mon Nov 18 11:00:29 2013 -0700 Revert "workqueue: allow work_on_cpu() to be called recursively" This reverts commit c2fda509667b0fda4372a237f5a59ea4570b1627. c2fda509667b removed lockdep annotation from work_on_cpu() to work around the PCI path that calls work_on_cpu() from within a work_on_cpu() work item (PF driver .probe() method -> pci_enable_sriov() -> add VFs -> VF driver .probe method). 84f23f99b507 ("PCI: Avoid unnecessary CPU switch when calling driver .probe() method) avoids that recursive work_on_cpu() use in a different way, so this revert restores the work_on_cpu() lockdep annotation. Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Acked-by: Tejun Heo <tj@xxxxxxxxxx> diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 987293d03ebc..5690b8eabfbc 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2840,19 +2840,6 @@ already_gone: return false; } -static bool __flush_work(struct work_struct *work) -{ - struct wq_barrier barr; - - if (start_flush_work(work, &barr)) { - wait_for_completion(&barr.done); - destroy_work_on_stack(&barr.work); - return true; - } else { - return false; - } -} - /** * flush_work - wait for a work to finish executing the last queueing instance * @work: the work to flush @@ -2866,10 +2853,18 @@ static bool __flush_work(struct work_struct *work) */ bool flush_work(struct work_struct *work) { + struct wq_barrier barr; + lock_map_acquire(&work->lockdep_map); lock_map_release(&work->lockdep_map); - return __flush_work(work); + if (start_flush_work(work, &barr)) { + wait_for_completion(&barr.done); + destroy_work_on_stack(&barr.work); + return true; + } else { + return false; + } } EXPORT_SYMBOL_GPL(flush_work); @@ -4814,14 +4809,7 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg) INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn); schedule_work_on(cpu, &wfc.work); - - /* - * The work item is on-stack and can't lead to deadlock through - * flushing. Use __flush_work() to avoid spurious lockdep warnings - * when work_on_cpu()s are nested. - */ - __flush_work(&wfc.work); - + flush_work(&wfc.work); return wfc.ret; } EXPORT_SYMBOL_GPL(work_on_cpu); -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html