Re: Possible regression with cgroups in 3.11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 16, 2013 at 01:53:56PM +0900, Tejun Heo wrote:
> Hello, Bjorn.
> 
> On Fri, Nov 15, 2013 at 05:28:20PM -0700, Bjorn Helgaas wrote:
> > It would be better to fix PCI so we don't call VF driver .probe() methods
> > from inside a PF driver .probe() method, but that's a bigger project.
> 
> Yeah, if pci doesn't need the recursion, we can simply revert restore
> the lockdep annoation on work_on_cpu().
> 
> > @@ -293,7 +293,9 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> >  	   its local memory on the right node without any need to
> >  	   change it. */
> >  	node = dev_to_node(&dev->dev);
> > -	if (node >= 0) {
> > +	preempt_disable();
> > +
> > +	if (node >= 0 && node != numa_node_id()) {
> 
> A bit of comment here would be nice but yeah I think this should work.
> Can you please also queue the revert of c2fda509667b ("workqueue:
> allow work_on_cpu() to be called recursively") after this patch?
> Please feel free to add my acked-by.

OK, below are the two patches (Alex's fix + the revert) I propose to
merge.  Unless there are objections, I'll ask Linus to pull these
before v3.13-rc1.

Bjorn



commit 84f23f99b507c2c9247f47d3db0f71a3fd65e3a3
Author: Alexander Duyck <alexander.h.duyck@xxxxxxxxx>
Date:   Mon Nov 18 10:59:59 2013 -0700

    PCI: Avoid unnecessary CPU switch when calling driver .probe() method
    
    If we are already on a CPU local to the device, call the driver .probe()
    method directly without using work_on_cpu().
    
    This is a workaround for a lockdep warning in the following scenario:
    
      pci_call_probe
        work_on_cpu(cpu, local_pci_probe, ...)
          driver .probe
            pci_enable_sriov
              ...
                pci_bus_add_device
                  ...
                    pci_call_probe
                      work_on_cpu(cpu, local_pci_probe, ...)
    
    It would be better to fix PCI so we don't call VF driver .probe() methods
    from inside a PF driver .probe() method, but that's a bigger project.
    
    [bhelgaas: disable preemption, open bugzilla, rework comments & changelog]
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=65071
    Link: http://lkml.kernel.org/r/CAE9FiQXYQEAZ=0sG6+2OdffBqfLS9MpoN1xviRR9aDbxPxcKxQ@xxxxxxxxxxxxxx
    Link: http://lkml.kernel.org/r/20130624195942.40795.27292.stgit@xxxxxxxxxxxxxxxxxxxxxxxx
    Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxx>
    Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
    Acked-by: Tejun Heo <tj@xxxxxxxxxx>

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 9042fdbd7244..add04e70ac2a 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -288,12 +288,24 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 	int error, node;
 	struct drv_dev_and_id ddi = { drv, dev, id };
 
-	/* Execute driver initialization on node where the device's
-	   bus is attached to.  This way the driver likely allocates
-	   its local memory on the right node without any need to
-	   change it. */
+	/*
+	 * Execute driver initialization on node where the device is
+	 * attached.  This way the driver likely allocates its local memory
+	 * on the right node.
+	 */
 	node = dev_to_node(&dev->dev);
-	if (node >= 0) {
+	preempt_disable();
+
+	/*
+	 * On NUMA systems, we are likely to call a PF probe function using
+	 * work_on_cpu().  If that probe calls pci_enable_sriov() (which
+	 * adds the VF devices via pci_bus_add_device()), we may re-enter
+	 * this function to call the VF probe function.  Calling
+	 * work_on_cpu() again will cause a lockdep warning.  Since VFs are
+	 * always on the same node as the PF, we can work around this by
+	 * avoiding work_on_cpu() when we're already on the correct node.
+	 */
+	if (node >= 0 && node != numa_node_id()) {
 		int cpu;
 
 		get_online_cpus();
@@ -305,6 +317,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 		put_online_cpus();
 	} else
 		error = local_pci_probe(&ddi);
+
+	preempt_enable();
 	return error;
 }
 
commit 2dde5285d06370b2004613ee4fd253e95622af20
Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Date:   Mon Nov 18 11:00:29 2013 -0700

    Revert "workqueue: allow work_on_cpu() to be called recursively"
    
    This reverts commit c2fda509667b0fda4372a237f5a59ea4570b1627.
    
    c2fda509667b removed lockdep annotation from work_on_cpu() to work around
    the PCI path that calls work_on_cpu() from within a work_on_cpu() work item
    (PF driver .probe() method -> pci_enable_sriov() -> add VFs -> VF driver
    .probe method).
    
    84f23f99b507 ("PCI: Avoid unnecessary CPU switch when calling driver
    .probe() method) avoids that recursive work_on_cpu() use in a different
    way, so this revert restores the work_on_cpu() lockdep annotation.
    
    Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
    Acked-by: Tejun Heo <tj@xxxxxxxxxx>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 987293d03ebc..5690b8eabfbc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2840,19 +2840,6 @@ already_gone:
 	return false;
 }
 
-static bool __flush_work(struct work_struct *work)
-{
-	struct wq_barrier barr;
-
-	if (start_flush_work(work, &barr)) {
-		wait_for_completion(&barr.done);
-		destroy_work_on_stack(&barr.work);
-		return true;
-	} else {
-		return false;
-	}
-}
-
 /**
  * flush_work - wait for a work to finish executing the last queueing instance
  * @work: the work to flush
@@ -2866,10 +2853,18 @@ static bool __flush_work(struct work_struct *work)
  */
 bool flush_work(struct work_struct *work)
 {
+	struct wq_barrier barr;
+
 	lock_map_acquire(&work->lockdep_map);
 	lock_map_release(&work->lockdep_map);
 
-	return __flush_work(work);
+	if (start_flush_work(work, &barr)) {
+		wait_for_completion(&barr.done);
+		destroy_work_on_stack(&barr.work);
+		return true;
+	} else {
+		return false;
+	}
 }
 EXPORT_SYMBOL_GPL(flush_work);
 
@@ -4814,14 +4809,7 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 
 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
 	schedule_work_on(cpu, &wfc.work);
-
-	/*
-	 * The work item is on-stack and can't lead to deadlock through
-	 * flushing.  Use __flush_work() to avoid spurious lockdep warnings
-	 * when work_on_cpu()s are nested.
-	 */
-	__flush_work(&wfc.work);
-
+	flush_work(&wfc.work);
 	return wfc.ret;
 }
 EXPORT_SYMBOL_GPL(work_on_cpu);
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux