On Wed, Oct 30, 2013 at 3:14 AM, Gu Zheng <guz.fnst@xxxxxxxxxxxxxx> wrote: > When concurent removing pci devices which are in the same pci subtree > via sysfs, such as: > echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 > > /sys/bus/pci/devices/0000\:1a\:01.0/remove > (1a:01.0 device is downstream from the 10:00.0 bridge) > > the following warning will show: > [ 1799.280918] ------------[ cut here ]------------ > [ 1799.336199] WARNING: CPU: 7 PID: 126 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0() > [ 1799.433093] list_del corruption, ffff8807b4a7c000->next is LIST_POISON1 (dead000000100100) > [ 1799.532110] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sg dm_mirror dm_region_hash dm_log dm_mod vfat fat e1000e igb iTCO_wdt iTCO_vendor_support ioatdma ptp i7core_edac ipmi_si edac_core lpc_ich pps_core i2c_i801 pcspkr mfd_core dca ipmi_msghandler acpi_cpufreq xfs libcrc32c sd_mod crc_t10dif crct10dif_common i2c_algo_bit drm_kms_helper ttm drm mptsas scsi_transport_sas mptscsih i2c_core megaraid_sas mptbase > [ 1800.276623] CPU: 7 PID: 126 Comm: kworker/u512:1 Tainted: G W 3.12.0-rc5+ #196 > [ 1800.508918] Workqueue: sysfsd sysfs_schedule_callback_work > [ 1800.574703] 0000000000000009 ffff8807adbadbd8 ffffffff8168b26c ffff8807c27d08a8 > [ 1800.663860] ffff8807adbadc28 ffff8807adbadc18 ffffffff810711dc ffff8807adbadc68 > [ 1800.753130] ffff8807b4a7c000 ffff8807b4a7c000 ffff8807ad089c00 0000000000000000 > [ 1800.842282] Call Trace: > [ 1800.871651] [<ffffffff8168b26c>] dump_stack+0x55/0x76 > [ 1800.933301] [<ffffffff810711dc>] warn_slowpath_common+0x8c/0xc0 > [ 1801.005283] [<ffffffff810712c6>] warn_slowpath_fmt+0x46/0x50 > [ 1801.074081] [<ffffffff8135a343>] __list_del_entry+0x63/0xd0 > [ 1801.141839] [<ffffffff8135a3c1>] list_del+0x11/0x40 > [ 1801.201320] [<ffffffff813734da>] pci_remove_bus_device+0x6a/0xe0 > [ 1801.274279] [<ffffffff8137356e>] pci_stop_and_remove_bus_device+0x1e/0x30 > [ 1801.356606] [<ffffffff8137b20b>] remove_callback+0x2b/0x40 > [ 1801.423412] [<ffffffff81251848>] sysfs_schedule_callback_work+0x18/0x60 > [ 1801.503744] [<ffffffff8108eab5>] process_one_work+0x1f5/0x540 > [ 1801.573640] [<ffffffff8108ea53>] ? process_one_work+0x193/0x540 > [ 1801.645616] [<ffffffff8108f2ac>] worker_thread+0x11c/0x370 > [ 1801.712337] [<ffffffff8108f190>] ? rescuer_thread+0x350/0x350 > [ 1801.782178] [<ffffffff8109731d>] kthread+0xed/0x100 > [ 1801.841661] [<ffffffff81097230>] ? kthread_create_on_node+0x160/0x160 > [ 1801.919919] [<ffffffff8169cc3c>] ret_from_fork+0x7c/0xb0 > [ 1801.984608] [<ffffffff81097230>] ? kthread_create_on_node+0x160/0x160 > [ 1802.062825] ---[ end trace d77f2054de000fb7 ]--- > > This issue is related to the bug 54411: > https://bugzilla.kernel.org/show_bug.cgi?id=54411 > > Since we added the pci_bus reference management, the bug becomes a > invalid list entry warning as descripted above. Beacuse it still > trys to delete an deleted pci device from device list. > > So here we make stop device actually detach driver only, and remove > device will do device_del instead, and move bus_list change and pci device > resource free into pci_release_dev, so that it'll consistent with > bus reference managment, and the device only can be deleted from device > list in pci_release_dev just for one time. > > Besides, it also makes hostbridge to use device_unregister to be pair > with device_register before. > > This patch is based on Yinghai's privious similar patch: > http://lkml.org/lkml/2013/5/13/658 I have updated version, please check if attached patches fix the problem. virtfn_release.patch fix_cx3_hotplug_2.patch fix_racing_removing_6_1.patch fix_racing_removing_6_2.patch fix_racing_removing_6_3.patch Thanks Yinghai
Subject: [PATCH] PCI: Release PF ref during removing VF From: Yinghai Lu <yinghai@xxxxxxxxxx> commit dc087f2f6a2925e81831f3016b9cbb6e470e7423 (PCI: Simplify IOV implementation and fix reference count races) broke the pcie native hotplug with SRIOV enabled: PF is not freed during hot-remove, and later hot-add do not work as old pci_dev is still around, and can not create new pci_dev. That commit need VFs to be removed via pci_disable_sriov/virtfn_remove to make sure ref to PF is put back. So we can not call stop_and_remove for VF before PF, that will make VF get removed directly before PF's driver try to use pci_disable_sriov/virtfn_remove to do it. Solution would be 1. separating stop and remove in two iterations. 2. or make pci_stop_and_remove_bus_device could be used on VF. This patch is second solution: Separate PF ref releasing from virtfn_remove, all that during pci_destroy_dev for VFs. -v2: fix warning when SRIOV is not defined. Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> --- drivers/pci/iov.c | 34 +++++++++++++++++++++++++++++----- drivers/pci/pci.h | 4 ++++ drivers/pci/remove.c | 17 +++++++++++++++++ 3 files changed, 50 insertions(+), 5 deletions(-) Index: linux-2.6/drivers/pci/iov.c =================================================================== --- linux-2.6.orig/drivers/pci/iov.c +++ linux-2.6/drivers/pci/iov.c @@ -132,9 +132,37 @@ failed: return rc; } -static void virtfn_remove(struct pci_dev *dev, int id, int reset) +void virtfn_release(struct pci_dev *virtfn) { + int i; + struct pci_dev *dev; char buf[VIRTFN_ID_LEN]; + + if (!virtfn->is_virtfn) + return; + + dev = virtfn->physfn; + + /* + * Remove link to virtfn for PF. + * pci_disable_sriov() could be called after pci_stop_dev() is + * called for PF. So check sd to avoid spurious sfsfs warning. + */ + if (dev->dev.kobj.sd) + for (i = 0; i < dev->sriov->num_VFs; i++) + if ((virtfn_bus(dev, i) == virtfn->bus->number) && + (virtfn_devfn(dev, i) == virtfn->devfn)) { + sprintf(buf, "virtfn%u", i); + sysfs_remove_link(&dev->dev.kobj, buf); + break; + } + + virtfn_remove_bus(dev->bus, virtfn->bus); + pci_dev_put(dev); +} + +static void virtfn_remove(struct pci_dev *dev, int id, int reset) +{ struct pci_dev *virtfn; struct pci_sriov *iov = dev->sriov; @@ -149,8 +177,6 @@ static void virtfn_remove(struct pci_dev __pci_reset_function(virtfn); } - sprintf(buf, "virtfn%u", id); - sysfs_remove_link(&dev->dev.kobj, buf); /* * pci_stop_dev() could have been called for this virtfn already, * so the directory for the virtfn may have been removed before. @@ -161,12 +187,10 @@ static void virtfn_remove(struct pci_dev mutex_lock(&iov->dev->sriov->lock); pci_stop_and_remove_bus_device(virtfn); - virtfn_remove_bus(dev->bus, virtfn->bus); mutex_unlock(&iov->dev->sriov->lock); /* balance pci_get_domain_bus_and_slot() */ pci_dev_put(virtfn); - pci_dev_put(dev); } static int sriov_migration(struct pci_dev *dev) Index: linux-2.6/drivers/pci/pci.h =================================================================== --- linux-2.6.orig/drivers/pci/pci.h +++ linux-2.6/drivers/pci/pci.h @@ -259,6 +259,7 @@ int pci_iov_resource_bar(struct pci_dev resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno); void pci_restore_iov_state(struct pci_dev *dev); int pci_iov_bus_range(struct pci_bus *bus); +void virtfn_release(struct pci_dev *dev); #else static inline int pci_iov_init(struct pci_dev *dev) @@ -281,6 +282,9 @@ static inline int pci_iov_bus_range(stru { return 0; } +static inline void virtfn_release(struct pci_dev *dev) +{ +} #endif /* CONFIG_PCI_IOV */ Index: linux-2.6/drivers/pci/remove.c =================================================================== --- linux-2.6.orig/drivers/pci/remove.c +++ linux-2.6/drivers/pci/remove.c @@ -38,6 +38,8 @@ static void pci_destroy_dev(struct pci_d list_del(&dev->bus_list); up_write(&pci_bus_sem); + virtfn_release(dev); + pci_free_resources(dev); put_device(&dev->dev); } @@ -107,8 +109,23 @@ static void pci_remove_bus_device(struct */ void pci_stop_and_remove_bus_device(struct pci_dev *dev) { +#ifdef CONFIG_PCI_ATS + struct pci_sriov *iov = NULL; + int locked = 0; + + if (dev->is_virtfn) { + iov = dev->physfn->sriov; + locked = mutex_trylock(&iov->dev->sriov->lock); + } +#endif + pci_stop_bus_device(dev); pci_remove_bus_device(dev); + +#ifdef CONFIG_PCI_ATS + if (locked) + mutex_unlock(&iov->dev->sriov->lock); +#endif } EXPORT_SYMBOL(pci_stop_and_remove_bus_device);
Subject: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that From: Yinghai Lu <yinghai@xxxxxxxxxx> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423 (PCI: Simplify IOV implementation and fix reference count races) VF need to be removed via virtfn_remove to make sure ref to PF is put back. Some driver (like ixgbe) does not call pci_disable_sriov() if sriov is enabled via /sys/.../sriov_numvfs setting. ixgbe does allow driver for PF get detached, but still have VFs around. But how about PF get removed via /sys or pciehp finally? During hot-remove, VF will still hold one ref to PF and it prevent PF to be removed. That make the next hot-add fails, as old PF dev struct is still around. We need to add pci_disable_sriov() calling during stop PF . Need this one for v3.11 -v2: Accoring to Bjorn, move that calling to pci_stop_dev. Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> Cc: Jiang Liu <liuj97@xxxxxxxxx> Cc: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> Cc: Donald Dutile <ddutile@xxxxxxxxxx> Cc: Greg Rose <gregory.v.rose@xxxxxxxxx> --- drivers/pci/remove.c | 2 ++ 1 file changed, 2 insertions(+) Index: linux-2.6/drivers/pci/remove.c =================================================================== --- linux-2.6.orig/drivers/pci/remove.c +++ linux-2.6/drivers/pci/remove.c @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev pci_proc_detach_device(dev); pci_remove_sysfs_dev_files(dev); device_del(&dev->dev); + /* remove VF, if PF driver skip that */ + pci_disable_sriov(dev); dev->is_added = 0; }
Subject: [PATCH 1/7] PCI: move back pci_proc_attach_devices calling From: Yinghai Lu <yinghai@xxxxxxxxxx> We stop detach proc when pci_stop_device. So should attach that during pci_bus_add_device. Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> --- drivers/pci/bus.c | 1 + drivers/pci/probe.c | 2 -- 2 files changed, 1 insertion(+), 2 deletions(-) Index: linux-2.6/drivers/pci/probe.c =================================================================== --- linux-2.6.orig/drivers/pci/probe.c +++ linux-2.6/drivers/pci/probe.c @@ -1382,8 +1382,6 @@ void pci_device_add(struct pci_dev *dev, dev->match_driver = false; ret = device_add(&dev->dev); WARN_ON(ret < 0); - - pci_proc_attach_device(dev); } struct pci_dev *__ref pci_scan_single_device(struct pci_bus *bus, int devfn) Index: linux-2.6/drivers/pci/bus.c =================================================================== --- linux-2.6.orig/drivers/pci/bus.c +++ linux-2.6/drivers/pci/bus.c @@ -175,6 +175,7 @@ int pci_bus_add_device(struct pci_dev *d * are not assigned yet for some devices. */ pci_fixup_device(pci_fixup_final, dev); + pci_proc_attach_device(dev); pci_create_sysfs_dev_files(dev); dev->match_driver = true;
Subject: [PATCH 2/7] PCI: move resources and bus_list releasing to pci_release_dev From: Yinghai Lu <yinghai@xxxxxxxxxx> We should not release resource in pci_destroy that is too early as there could be still other use hold reference. release them or remove it from bus devices list at last in pci_release_dev instead. Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> --- drivers/pci/probe.c | 25 +++++++++++++++++++++++-- drivers/pci/remove.c | 21 --------------------- 2 files changed, 23 insertions(+), 23 deletions(-) Index: linux-2.6/drivers/pci/probe.c =================================================================== --- linux-2.6.orig/drivers/pci/probe.c +++ linux-2.6/drivers/pci/probe.c @@ -1155,6 +1155,20 @@ static void pci_release_capabilities(str pci_free_cap_save_buffers(dev); } +static void pci_free_resources(struct pci_dev *dev) +{ + int i; + + msi_remove_pci_irq_vectors(dev); + + pci_cleanup_rom(dev); + for (i = 0; i < PCI_NUM_RESOURCES; i++) { + struct resource *res = dev->resource + i; + if (res->parent) + release_resource(res); + } +} + /** * pci_release_dev - free a pci device structure when all users of it are finished. * @dev: device that's been disconnected @@ -1164,9 +1178,16 @@ static void pci_release_capabilities(str */ static void pci_release_dev(struct device *dev) { - struct pci_dev *pci_dev; + struct pci_dev *pci_dev = to_pci_dev(dev); + + down_write(&pci_bus_sem); + list_del(&pci_dev->bus_list); + up_write(&pci_bus_sem); + + virtfn_release(pci_dev); + + pci_free_resources(pci_dev); - pci_dev = to_pci_dev(dev); pci_release_capabilities(pci_dev); pci_release_of_node(pci_dev); pcibios_release_device(pci_dev); Index: linux-2.6/drivers/pci/remove.c =================================================================== --- linux-2.6.orig/drivers/pci/remove.c +++ linux-2.6/drivers/pci/remove.c @@ -3,20 +3,6 @@ #include <linux/pci-aspm.h> #include "pci.h" -static void pci_free_resources(struct pci_dev *dev) -{ - int i; - - msi_remove_pci_irq_vectors(dev); - - pci_cleanup_rom(dev); - for (i = 0; i < PCI_NUM_RESOURCES; i++) { - struct resource *res = dev->resource + i; - if (res->parent) - release_resource(res); - } -} - static void pci_stop_dev(struct pci_dev *dev) { pci_pme_active(dev, false); @@ -36,13 +22,6 @@ static void pci_stop_dev(struct pci_dev static void pci_destroy_dev(struct pci_dev *dev) { - down_write(&pci_bus_sem); - list_del(&dev->bus_list); - up_write(&pci_bus_sem); - - virtfn_release(dev); - - pci_free_resources(dev); put_device(&dev->dev); }
Subject: [PATCH 3/7] PCI: Destroy pci dev only once From: Yinghai Lu <yinghai@xxxxxxxxxx> Mutliple removing via /sys will call pci_destroy_dev two times. Add is_removed to record if pci_destroy_dev is called already. During second calling, still have extra dev ref hold via device_schedule_call, so we are safe to check dev->is_removed. Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx> --- drivers/pci/remove.c | 5 ++++- include/linux/pci.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) Index: linux-2.6/drivers/pci/remove.c =================================================================== --- linux-2.6.orig/drivers/pci/remove.c +++ linux-2.6/drivers/pci/remove.c @@ -22,7 +22,10 @@ static void pci_stop_dev(struct pci_dev static void pci_destroy_dev(struct pci_dev *dev) { - put_device(&dev->dev); + if (!dev->is_removed) { + dev->is_removed = 1; + put_device(&dev->dev); + } } void pci_remove_bus(struct pci_bus *bus) Index: linux-2.6/include/linux/pci.h =================================================================== --- linux-2.6.orig/include/linux/pci.h +++ linux-2.6/include/linux/pci.h @@ -308,6 +308,7 @@ struct pci_dev { unsigned int multifunction:1;/* Part of multi-function device */ /* keep track of device state */ unsigned int is_added:1; + unsigned int is_removed:1; /* pci_destroy_dev is called */ unsigned int is_busmaster:1; /* device is busmaster */ unsigned int no_msi:1; /* device may not use msi */ unsigned int block_cfg_access:1; /* config space access is blocked */