On Wed, Nov 13, 2013 at 04:38:06PM +0900, Tejun Heo wrote: > Hey, guys. > > cc'ing people from "workqueue, pci: INFO: possible recursive locking > detected" thread. > > http://thread.gmane.org/gmane.linux.kernel/1525779 > > So, to resolve that issue, we ripped out lockdep annotation from > work_on_cpu() and cgroup is now experiencing deadlock involving > work_on_cpu(). It *could* be that workqueue is actually broken or > memcg is looping but it doesn't seem like a very good idea to not have > lockdep annotation around work_on_cpu(). > > IIRC, there was one pci code path which called work_on_cpu() > recursively. Would it be possible for that path to use something like > work_on_cpu_nested(XXX, depth) so that we can retain lockdep > annotation on work_on_cpu()? I'm open to changing the way pci_call_probe() works, but my opinion is that the PCI path that causes trouble is a broken design, and we shouldn't complicate the work_on_cpu() interface just to accommodate that broken design. The problem is that when a PF .probe() method that calls pci_enable_sriov(), we add new VF devices and call *their* .probe() methods before the PF .probe() method completes. That is ugly and error-prone. When we call .probe() methods for the VFs, we're obviously already on the correct node, because the VFs are on the same node as the PF, so I think the best short-term fix is Alexander's patch to avoid work_on_cpu() when we're already on the correct node -- something like the (untested) patch below. Bjorn PCI: Avoid unnecessary CPU switch when calling driver .probe() method From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> If we are already on a CPU local to the device, call the driver .probe() method directly without using work_on_cpu(). This is a workaround for a lockdep warning in the following scenario: pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) driver .probe pci_enable_sriov ... pci_bus_add_device ... pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) It would be better to fix PCI so we don't call VF driver .probe() methods from inside a PF driver .probe() method, but that's a bigger project. This patch is due to Alexander Duyck <alexander.h.duyck@xxxxxxxxx>; I merely added the preemption disable. Link: https://bugzilla.kernel.org/show_bug.cgi?id=65071 Link: http://lkml.kernel.org/r/CAE9FiQXYQEAZ=0sG6+2OdffBqfLS9MpoN1xviRR9aDbxPxcKxQ@xxxxxxxxxxxxxx Link: http://lkml.kernel.org/r/20130624195942.40795.27292.stgit@xxxxxxxxxxxxxxxxxxxxxxxx Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> --- drivers/pci/pci-driver.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 454853507b7e..accae06aa79a 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -293,7 +293,9 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, its local memory on the right node without any need to change it. */ node = dev_to_node(&dev->dev); - if (node >= 0) { + preempt_disable(); + + if (node >= 0 && node != numa_node_id()) { int cpu; get_online_cpus(); @@ -305,6 +307,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, put_online_cpus(); } else error = local_pci_probe(&ddi); + + preempt_enable(); return error; } -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html