On Tue, Oct 13, 2020 at 10:58:20AM +0100, Marc Zyngier wrote: > If, for some reason, the xusb PHY fails to probe, it leaves > a dangling pointer attached to the platform device structure. > > This would normally be harmless, but the Tegra XHCI driver then > goes and extract that pointer from the PHY device. Things go > downhill from there: > > 8.752082] [004d554e5145533c] address between user and kernel address ranges > [ 8.752085] Internal error: Oops: 96000004 [#1] PREEMPT SMP > [ 8.752088] Modules linked in: max77620_regulator(E+) xhci_tegra(E+) sdhci_tegra(E+) xhci_hcd(E) sdhci_pltfm(E) cqhci(E) fixed(E) usbcore(E) scsi_mod(E) sdhci(E) host1x(E+) > [ 8.752103] CPU: 4 PID: 158 Comm: systemd-udevd Tainted: G S W E 5.9.0-rc7-00298-gf6337624c4fe #1980 > [ 8.752105] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT) > [ 8.752108] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) > [ 8.752115] pc : kobject_put+0x1c/0x21c > [ 8.752120] lr : put_device+0x20/0x30 > [ 8.752121] sp : ffffffc012eb3840 > [ 8.752122] x29: ffffffc012eb3840 x28: ffffffc010e82638 > [ 8.752125] x27: ffffffc008d56440 x26: 0000000000000000 > [ 8.752128] x25: ffffff81eb508200 x24: 0000000000000000 > [ 8.752130] x23: ffffff81eb538800 x22: 0000000000000000 > [ 8.752132] x21: 00000000fffffdfb x20: ffffff81eb538810 > [ 8.752134] x19: 3d4d554e51455300 x18: 0000000000000020 > [ 8.752136] x17: ffffffc008d00270 x16: ffffffc008d00c94 > [ 8.752138] x15: 0000000000000004 x14: ffffff81ebd4ae90 > [ 8.752140] x13: 0000000000000000 x12: ffffff81eb86a4e8 > [ 8.752142] x11: ffffff81eb86a480 x10: ffffff81eb862fea > [ 8.752144] x9 : ffffffc01055fb28 x8 : ffffff81eb86a4a8 > [ 8.752146] x7 : 0000000000000001 x6 : 0000000000000001 > [ 8.752148] x5 : ffffff81dff8bc38 x4 : 0000000000000000 > [ 8.752150] x3 : 0000000000000001 x2 : 0000000000000001 > [ 8.752152] x1 : 0000000000000002 x0 : 3d4d554e51455300 > [ 8.752155] Call trace: > [ 8.752157] kobject_put+0x1c/0x21c > [ 8.752160] put_device+0x20/0x30 > [ 8.752164] tegra_xusb_padctl_put+0x24/0x3c > [ 8.752170] tegra_xusb_probe+0x8b0/0xd10 [xhci_tegra] > [ 8.752174] platform_drv_probe+0x60/0xb4 > [ 8.752176] really_probe+0xf0/0x504 > [ 8.752179] driver_probe_device+0x100/0x170 > [ 8.752181] device_driver_attach+0xcc/0xd4 > [ 8.752183] __driver_attach+0xb0/0x17c > [ 8.752185] bus_for_each_dev+0x7c/0xd4 > [ 8.752187] driver_attach+0x30/0x3c > [ 8.752189] bus_add_driver+0x154/0x250 > [ 8.752191] driver_register+0x84/0x140 > [ 8.752193] __platform_driver_register+0x54/0x60 > [ 8.752197] tegra_xusb_init+0x40/0x1000 [xhci_tegra] > [ 8.752201] do_one_initcall+0x54/0x2d0 > [ 8.752205] do_init_module+0x68/0x29c > [ 8.752207] load_module+0x2178/0x26c0 > [ 8.752209] __do_sys_finit_module+0xb0/0x120 > [ 8.752211] __arm64_sys_finit_module+0x2c/0x40 > [ 8.752215] el0_svc_common.constprop.0+0x80/0x240 > [ 8.752218] do_el0_svc+0x30/0xa0 > [ 8.752220] el0_svc+0x18/0x50 > [ 8.752223] el0_sync_handler+0x90/0x318 > [ 8.752225] el0_sync+0x158/0x180 > [ 8.752230] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (3940f000) > [ 8.752232] ---[ end trace 90f6c89d62d85ff5 ]--- > > Reset the pointer on probe failure fixes the issue. > > Fixes: 53d2a715c2403 ("phy: Add Tegra XUSB pad controller support") > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > --- > drivers/phy/tegra/xusb.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c > index de4a46fe1763..ad88d74c1884 100644 > --- a/drivers/phy/tegra/xusb.c > +++ b/drivers/phy/tegra/xusb.c > @@ -1242,6 +1242,7 @@ static int tegra_xusb_padctl_probe(struct platform_device *pdev) > reset: > reset_control_assert(padctl->rst); > remove: > + platform_set_drvdata(pdev, NULL); > soc->ops->remove(padctl); > return err; > } Sorry, I had missed this before. Why is this necessary? The driver core already does dev_set_drvdata(dev, NULL) on failure, which is the same as your platform_set_drvdata() here. I suppose one possible explanation would be if for some reason we end up here in the error cleanup path but with err == 0. Do you have more information on when this happens so that I can repro and investigate? Alternatively, if you've still got this set up, can you do a quick test to see if "err" is indeed a negative error code when we get here? Thierry
Attachment:
signature.asc
Description: PGP signature