Re: [PATCH 1/7] PCI: Make sriov work with hotplug remove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 23, 2012 at 8:06 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Sat, Jan 21, 2012 at 1:52 AM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>>
>> +       /*
>> +        * pci_stop_bus_device(dev) will not remove dev from bus->devices list,
>> +        *  so We don't need use _safe version for_each here.
>> +        * Also _safe version has problem when pci_stop_bus_device() for PF try
>> +        *  to remove VFs.
>> +        */
>> +       for (l = head->next; l != head;) {
>
> That's crazy. Why would you open-code this? Why isn't it just a
> "list_for_each()"?

I have previous version used list_for_each(), but Kenji thought we
should open version because it could be clear that l is updated in the
loop.

>
> And what are the problems with the safe version? If the safe version
> doesn't work, then something is *seriously* wrong with the list.

in list_for_each_safe()

#define list_for_each_safe(pos, n, head) \
        for (pos = (head)->next, n = pos->next; pos != (head); \
                pos = n, n = pos->next)

n is saved before, and safe only mean pos could be freed from the
list, but n still can be used for next loop.

in our case, the list have PF and several VFs, when
pci_stop_bus_device() is called for PF, pos are still valid, but
VFs are removed from the list. so n will not be valid.

>
>> +               struct pci_dev *dev = pci_dev_b(l);
>> +
>> +               /*
>> +                * VFs are removed by pci_remove_bus_device() in the
>> +                *  pci_stop_bus_devices() code path for PF.
>> +                *  aka, bus->devices get updated in the process.
>> +                * barrier() will make sure we get right next from that list.
>> +                */
>> +               if (!dev->is_virtfn) {
>> +                       pci_stop_bus_device(dev);
>> +                       barrier();
>> +               }
>
> And this is just insanity. The "barrier()" cannot *possibly* do
> anything sane. If it really makes a difference, there is again some
> serious problem with the whole f*cking thing.
>
> NAK on the patch until sanity is restored. This is just total voodoo
> programming.

Sorry for that.

Can you please check V1 version ?

https://lkml.org/lkml/2011/10/15/141
or from attached one.

Thanks

Yinghai
From: Yinghai Lu <yinghai@xxxxxxxxx>
Subject: [PATCH 01/10] PCI: Make sriov work with hotplug remove

When hot remove pci express module that have pcie switch and support SRIOV, got

[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...

 +-[0000:80]-+-00.0-[81-8f]--
 |           +-01.0-[90-9f]--
 |           +-02.0-[a0-af]--
 |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
 |           |                               |            +-00.1 Device
 |           |                               |            +-00.2 Device
 |           |                               |            \-00.3 Device
 |           |                               \-03.0-[b3]--+-00.0 Device
 |           |                                            +-00.1 Device
 |           |                                            +-00.2 Device
 |           |                                            \-00.3 Device

root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
                end devices  are b2:00.0 and b3.00.0.
                VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3

Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
remove the fn, so
	list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.

Solution is just call pci_stop_bus_device() with phys fn only. and before that need to
save phys fn aside and avoid to use bus->devices to loop.

During reviewing the patch, Bjorn said:
|   The PCI hot-remove path calls pci_stop_bus_devices() via
|   pci_remove_bus_device().
|
|   pci_stop_bus_devices() traverses the bus->devices list (point A below),
|   stopping each device in turn, which calls the driver remove() method.  When
|   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
|   also uses pci_remove_bus_device() to remove the VF devices from the
|   bus->devices list (point B).
|
|       pci_remove_bus_device
|         pci_stop_bus_device
|           pci_stop_bus_devices(subordinate)
|             list_for_each(bus->devices)             <-- A
|               pci_stop_bus_device(PF)
|                 ...
|                   driver->remove
|                     pci_disable_sriov
|                       ...
|                         pci_remove_bus_device(VF)
|                             <remove from bus_list>  <-- B
|
|   At B, we're changing the same list we're iterating through at A, so when
|   the driver remove() method returns, the pci_stop_bus_devices() iterator has
|   a pointer to a list entry that has already been freed.
|
|   This patch avoids the problem by building a separate list of all PFs on
|   the bus and traversing that at A instead of the bus->devices list.

Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141

Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>

---
 drivers/pci/remove.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -120,10 +120,43 @@ void pci_remove_behind_bridge(struct pci
 			pci_remove_bus_device(pci_dev_b(l));
 }
 
+struct dev_list {
+	struct pci_dev *dev;
+	struct list_head list;
+};
+
 static void pci_stop_bus_devices(struct pci_bus *bus)
 {
 	struct list_head *l, *n;
+	struct dev_list *dl, *dn;
+	LIST_HEAD(physfn_list);
+
+	/* Save phys_fn aside at first */
+	list_for_each(l, &bus->devices) {
+		struct pci_dev *dev = pci_dev_b(l);
+
+		if (!dev->is_virtfn) {
+			dl = kmalloc(sizeof(*dl), GFP_KERNEL);
+			if (!dl)
+				continue;
+			dl->dev = dev;
+			list_add_tail(&dl->list, &physfn_list);
+		}
+	}
+
+	/*
+	 * stop bus device for phys_fn at first
+	 *  it will stop and remove vf in driver remove action
+	 */
+	list_for_each_entry_safe(dl, dn, &physfn_list, list) {
+		struct pci_dev *dev = dl->dev;
+
+		pci_stop_bus_device(dev);
+
+		kfree(dl);
+	}
 
+	/* Do it again for left over in case */
 	list_for_each_safe(l, n, &bus->devices) {
 		struct pci_dev *dev = pci_dev_b(l);
 		pci_stop_bus_device(dev);

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux