On Mon, May 10, 2021 at 05:57:33PM -0500, Bjorn Helgaas wrote: > On Mon, May 10, 2021 at 03:02:45PM +0000, Slivka, Danijel wrote: > > Hi Bjorn, > > > > Yes, I get segmentation fault on unloading custom module during the > > check of error handling case. > > > > There is no directly visible access to res_attr fields, as you > > mentioned, other than the one in a call chain after a check in > > pci_remove_resource_files() which seems to cause the issue > > (accessing name). > > > > Load and unload module will invoke pci_enable_sriov() and > > disable_pci_sriov() respectively that are both having a call of > > pci_remove_resource_files() in call chain. > > > > In that function existing check is not behaving as expected since > > memory is freed but pointers left dangling. > > > > Below is call trace and detail description. > > > > During loading of module pci_enable_sriov() is called, I have > > following invoking sequence: > > > > device_create_file > > pci_create_capabilities_sysfs > > pci_create_sysfs_dev_files > > pci_bus_add_device > > pci_iov_add_virtfn > > sriov_enable > > pci_enable_sriov > > OK. For anybody following along, this call path changed in v5.13-rc1, > so pci_create_capabilities_sysfs() longer exists. But looking at > v5.12, I think the sequence you're seeing is: > > pci_create_sysfs_dev_files > pci_create_capabilities_sysfs > retval = device_create_file(&dev->dev, &dev_attr_reset) > return retval # I guess this what failed, right? > if (retval) goto err_rom_file > err_rom_file: > ... > pci_remove_resource_files > sysfs_remove_bin_file(pdev->res_attr[i]) > kfree(pdev->res_attr[i]) > # pdev->res_attr[i] not set to NULL in v5.12 > > Later, on module unload, we have this sequence: > > pci_disable_sriov > sriov_disable > sriov_del_vfs > pci_iov_remove_virtfn > pci_stop_and_remove_bus_device > pci_stop_bus_device > pci_stop_dev > pci_remove_sysfs_dev_files > pci_remove_resource_files > sysfs_remove_bin_file(pdev->res_attr[i]) > # pdev->res_attr[i] points to a freed object > > Definitely seems like a problem. Hmmm. I'm not really a fan of > checking the pointer to see whether it's been freed. That seems like > a band-aid. This patch does: + pdev->res_attr[i] = NULL; + pdev->res_attr_wc[i] = NULL; which I think essentially relies on the fact that kfree(NULL) is a no-op. I'm going to drop this for now because I don't want to rely on that. I'd rather avoid doing the kfree() altogether. IIRC this happens when device_create_file() fails, and it likely fails because of a race when creating the sysfs files, which would explain why we don't see lots of reports of this. Bjorn