On Sat, Nov 30, 2024 at 12:23:19PM -0600, Bjorn Helgaas wrote: > [+to Mani; +cc Jon, Saurabh] > > On Tue, Nov 26, 2024 at 01:04:34PM -0800, Brian Norris wrote: > > of_find_device_by_node() doesn't like a NULL pointer, and may end up > > identifying an arbitrary device, which we then start tearing down. We > > should check for NULL first. > > > > Resolves issues seen when doing `echo 1 > /sys/bus/pci/devices/.../remove`: > > > > [ 222.952201] ------------[ cut here ]------------ > > [ 222.952218] WARNING: CPU: 0 PID: 5095 at drivers/regulator/core.c:5885 regulator_unregister+0x140/0x160 > > ... > > [ 222.953490] CPU: 0 UID: 0 PID: 5095 Comm: bash Tainted: G C 6.12.0-rc1 #3 > > ... > > [ 222.954134] Call trace: > > [ 222.954150] regulator_unregister+0x140/0x160 > > [ 222.954186] devm_rdev_release+0x1c/0x30 > > [ 222.954215] release_nodes+0x68/0x100 > > [ 222.954249] devres_release_all+0x98/0xf8 > > [ 222.954282] device_unbind_cleanup+0x20/0x70 > > [ 222.954306] device_release_driver_internal+0x1f4/0x240 > > [ 222.954333] device_release_driver+0x20/0x40 > > [ 222.954358] bus_remove_device+0xd8/0x170 > > [ 222.954393] device_del+0x154/0x380 > > [ 222.954422] device_unregister+0x28/0x88 > > [ 222.954451] of_device_unregister+0x1c/0x30 > > [ 222.954488] pci_stop_bus_device+0x154/0x1b0 > > [ 222.954521] pci_stop_and_remove_bus_device_locked+0x28/0x48 > > [ 222.954553] remove_store+0xa0/0xb8 > > [ 222.954589] dev_attr_store+0x20/0x40 > > [ 222.954615] sysfs_kf_write+0x4c/0x68 > > [ 222.954644] kernfs_fop_write_iter+0x128/0x200 > > [ 222.954670] do_iter_readv_writev+0xdc/0x1e0 > > [ 222.954709] vfs_writev+0x100/0x2a0 > > [ 222.954742] do_writev+0x84/0x130 > > [ 222.954773] __arm64_sys_writev+0x28/0x40 > > [ 222.954808] invoke_syscall+0x50/0x120 > > [ 222.954845] el0_svc_common.constprop.0+0x48/0xf0 > > [ 222.954878] do_el0_svc+0x24/0x38 > > [ 222.954910] el0_svc+0x34/0xe0 > > [ 222.954945] el0t_64_sync_handler+0x120/0x138 > > [ 222.954978] el0t_64_sync+0x190/0x198 > > [ 222.955006] ---[ end trace 0000000000000000 ]--- > > [ 222.965216] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0 > > ... > > [ 223.107395] CPU: 3 UID: 0 PID: 5095 Comm: bash Tainted: G WC 6.12.0-rc1 #3 > > ... > > [ 223.227750] Call trace: > > [ 223.230501] pci_stop_bus_device+0x190/0x1b0 > > [ 223.235314] pci_stop_and_remove_bus_device_locked+0x28/0x48 > > [ 223.241672] remove_store+0xa0/0xb8 > > [ 223.245616] dev_attr_store+0x20/0x40 > > [ 223.249737] sysfs_kf_write+0x4c/0x68 > > [ 223.253859] kernfs_fop_write_iter+0x128/0x200 > > [ 223.253887] do_iter_readv_writev+0xdc/0x1e0 > > [ 223.263631] vfs_writev+0x100/0x2a0 > > [ 223.267550] do_writev+0x84/0x130 > > [ 223.271273] __arm64_sys_writev+0x28/0x40 > > [ 223.275774] invoke_syscall+0x50/0x120 > > [ 223.279988] el0_svc_common.constprop.0+0x48/0xf0 > > [ 223.285270] do_el0_svc+0x24/0x38 > > [ 223.288993] el0_svc+0x34/0xe0 > > [ 223.292426] el0t_64_sync_handler+0x120/0x138 > > [ 223.297311] el0t_64_sync+0x190/0x198 > > [ 223.301423] Code: 17fffff8 91030000 d2800101 f9800011 (c85f7c02) > > [ 223.308248] ---[ end trace 0000000000000000 ]--- > > > > Fixes: 681725afb6b9 ("PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent") > > Signed-off-by: Brian Norris <briannorris@xxxxxxxxxxxx> > > Applied to pci/for-linus; hopefully this can still make v6.13-rc1. > > I added a Reported-by for Saurabh and tried to add some context around > why we got in this fix in the first place, but I'm not confident that > I got it all right. Please comment: > > commit 5c8418cf4025 ("PCI/pwrctrl: Unregister platform device only if one actually exists") > Author: Brian Norris <briannorris@xxxxxxxxxxxx> > Date: Tue Nov 26 13:04:34 2024 -0800 > > PCI/pwrctrl: Unregister platform device only if one actually exists > > If a PCI device has an associated device_node with power supplies, > pci_bus_add_device() creates platform devices for use by pwrctrl. When the > PCI device is removed, pci_stop_dev() uses of_find_device_by_node() to > locate the related platform device, then unregisters it. > > But when we remove a PCI device with no associated device node, > dev_of_node(dev) is NULL, and of_find_device_by_node(NULL) returns the > first device with "dev->of_node == NULL". The result is that we (a) > mistakenly unregister a completely unrelated platform device, leading to > issues like the first trace below, and (b) dereference the NULL pointer > from dev_of_node() when clearing OF_POPULATED, as in the second trace. > > Unregister a platform device only if there is one associated with this PCI > device. This resolves issues seen when doing: > > # echo 1 > /sys/bus/pci/devices/.../remove > > Sample issue from unregistering the wrong platform device: > > WARNING: CPU: 0 PID: 5095 at drivers/regulator/core.c:5885 regulator_unregister+0x140/0x160 > Call trace: > regulator_unregister+0x140/0x160 > devm_rdev_release+0x1c/0x30 > release_nodes+0x68/0x100 > devres_release_all+0x98/0xf8 > device_unbind_cleanup+0x20/0x70 > device_release_driver_internal+0x1f4/0x240 > device_release_driver+0x20/0x40 > bus_remove_device+0xd8/0x170 > device_del+0x154/0x380 > device_unregister+0x28/0x88 > of_device_unregister+0x1c/0x30 > pci_stop_bus_device+0x154/0x1b0 > pci_stop_and_remove_bus_device_locked+0x28/0x48 > remove_store+0xa0/0xb8 > dev_attr_store+0x20/0x40 > sysfs_kf_write+0x4c/0x68 > > Later NULL pointer dereference for of_node_clear_flag(NULL, OF_POPULATED): > > Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0 > Call trace: > pci_stop_bus_device+0x190/0x1b0 > pci_stop_and_remove_bus_device_locked+0x28/0x48 > remove_store+0xa0/0xb8 > dev_attr_store+0x20/0x40 > sysfs_kf_write+0x4c/0x68 > This looks good to me. Thanks for taking care! - Mani > Link: https://lore.kernel.org/r/20241126210443.4052876-1-briannorris@xxxxxxxxxxxx > Fixes: 681725afb6b9 ("PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent") > Reported-by: Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx> > Closes: https://lore.kernel.org/r/1732890621-19656-1-git-send-email-ssengar@xxxxxxxxxxxxxxxxxxx > Signed-off-by: Brian Norris <briannorris@xxxxxxxxxxxx> > [bhelgaas: commit log] > Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > --- > > > > drivers/pci/remove.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > > index 963b8d2855c1..efc37fcb73e2 100644 > > --- a/drivers/pci/remove.c > > +++ b/drivers/pci/remove.c > > @@ -19,14 +19,19 @@ static void pci_free_resources(struct pci_dev *dev) > > > > static void pci_pwrctrl_unregister(struct device *dev) > > { > > + struct device_node *np; > > struct platform_device *pdev; > > > > - pdev = of_find_device_by_node(dev_of_node(dev)); > > + np = dev_of_node(dev); > > + if (!np) > > + return; > > + > > + pdev = of_find_device_by_node(np); > > if (!pdev) > > return; > > > > of_device_unregister(pdev); > > - of_node_clear_flag(dev_of_node(dev), OF_POPULATED); > > + of_node_clear_flag(np, OF_POPULATED); > > } > > > > static void pci_stop_dev(struct pci_dev *dev) > > -- > > 2.47.0.338 > > -- மணிவண்ணன் சதாசிவம்