On Sun, 7 Jun 2020 10:43:48 +0100 Marc Zyngier <maz@xxxxxxxxxx> wrote: Hi Bjorn, > On a system that creates VFs for multiple PFs in parallel (in > this case, network bringup at boot time), and when these VFs > end-up on the same bus, bad things sometimes happen: > > [ 12.755534] sysfs: cannot create duplicate filename '/devices/platform/soc/fc000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:01.0/pci_bus/0000:04' > [ 12.755700] pci 0000:04:10.1: [8086:10ca] type 00 class 0x020000 > [ 12.763785] CPU: 1 PID: 581 Comm: vfs Tainted: G E 5.7.0-00033-g002d24ebd695 #1119 > [ 12.770402] igb 0000:03:00.1: 1 VFs allocated > [ 12.778493] Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020 > [ 12.778496] Call trace: > [ 12.778506] dump_backtrace+0x0/0x1d0 > [ 12.778511] show_stack+0x20/0x30 > [ 12.778516] dump_stack+0xb8/0x100 > [ 12.778520] sysfs_warn_dup+0x6c/0x88 > [ 12.778530] sysfs_create_dir_ns+0xe8/0x100 > [ 12.778535] kobject_add_internal+0xe0/0x3a0 > [ 12.778541] kobject_add+0x94/0x100 > [ 12.817654] device_add+0x104/0x7b8 > [ 12.821100] device_register+0x28/0x38 > [ 12.824810] pci_add_new_bus+0x1f8/0x488 > [ 12.828692] pci_iov_add_virtfn+0x2c8/0x360 > [ 12.832830] sriov_enable+0x200/0x458 > [ 12.836452] pci_enable_sriov+0x20/0x38 > [ 12.840282] igb_enable_sriov+0x148/0x290 [igb] > [ 12.844745] igb_pci_sriov_configure+0x40/0x80 [igb] > [ 12.849650] sriov_numvfs_store+0xb0/0x1a0 > [ 12.853703] dev_attr_store+0x20/0x38 > [ 12.857327] sysfs_kf_write+0x4c/0x60 > [ 12.860947] kernfs_fop_write+0x104/0x220 > [ 12.864916] __vfs_write+0x24/0x50 > [ 12.868279] vfs_write+0xec/0x1d8 > [ 12.871556] ksys_write+0x74/0x100 > [ 12.874919] __arm64_sys_write+0x24/0x30 > [ 12.878802] el0_svc_common.constprop.0+0x7c/0x1f8 > [ 12.883544] do_el0_svc+0x2c/0x98 > [ 12.886824] el0_svc+0x18/0x48 > [ 12.889841] el0_sync_handler+0x120/0x290 > [ 12.893808] el0_sync+0x158/0x180 > [ 12.897143] kobject_add_internal failed for 0000:04 with -EEXIST, don't try to register things with the same name in the same directory. > [ 12.897634] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.4.0-k > > It turns out that virtfn_add_bus() doesn't hold any lock, which > means there is a potential race between checking that the bus > exists already, and adding it if it doesn't. > > A per-device lock wouldn't help, as this happens when multiple > PFs insert their respective VFs concurrently. > > Instead, let's introduce new mutex, private to the IOV subsystem, > that gets taken when dealing with a virtfn bus (either creation > or destruction). This ensures that these operations get serialized. > > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> Did you have a chance to look into this? I can reliably trigger it on one of my boxes. Happy to help debugging it further if you think this hack isn't the right fix. Thanks, M. -- Jazz is not dead. It just smells funny...