Hi Andrew L and Andrew H, Sorry for the delayed response. I couldn't get to testing anything until just now. On Tue, Apr 23, 2024 at 03:07:15PM -0500, Andrew Halaney wrote: > On Tue, Apr 23, 2024 at 03:52:35PM +0200, Andrew Lunn wrote: > > On Mon, Apr 22, 2024 at 11:00:51PM -0500, Colin Foster wrote: > > > > In these two last transactions, the ACK bit is not set. > > > > > [ 1.550471] SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5 > > > [ 1.550592] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720 > > > > > > Without the mdiodev->reset_state patch, I see the following: > > > > > > [ 1.537817] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6, bus freq 1000000 > > > [ 1.538165] davinci mdio reg is 0x20400007 > > > [ 1.538426] davinci mdio reg is 0x2060c0f1 > > > > Same as above. > > > > > [ 1.558442] davinci mdio reg is 0x23a00090 > > > [ 1.558717] davinci mdio reg is 0x20207809 > > > [ 1.559681] davinci mdio reg is 0x21c0ffff > > > > In all these cases, we see the ACK bit set. > > > > So the PHY is responding to registers 2 and 3, the ID registers. But > > it seems to be failing to respond to other registers. At a guess, i > > would say it is still coming out of reset. Does the datasheet for the > > LAN8710/LAN8720 say anything about how long a reset takes? Can you get > > a logic analyser onto the reset line and MDIO bus and see how > > different the timing is? It might be you need to add some delay values > > to the reset in DT. I don't think I'll be able to get onto those lines. But I do think this is the right tree to bark up. I also found some kernelci logs that suggest I'm not the only one seeing this issue: https://storage.kernelci.org/mainline/master/v6.9-rc5/arm/multi_v7_defconfig/gcc-10/lab-cip/baseline-beaglebone-black.html There might be ways to navigate the kernelci database that I'm not aware of, but I couldn't reasonably say "before 6.8 it didn't happen, and after 6.8 it did." I'm not sure that matters at this point though. > > For what its worth, I think that this theory makes sense if reverting the patch > highlighted above makes this go away. Before that patch, you'd see a > flow like this: > > net: phy: mdio_device: Reset device only when necessary > > Currently the phy reset sequence is as shown below for a > devicetree described mdio phy on boot: > > 1. Assert the phy_device's reset as part of registering > 2. Deassert the phy_device's reset as part of registering > 3. Deassert the phy_device's reset as part of phy_probe > 4. Deassert the phy_device's reset as part of phy_hw_init > > Which means whatever the deassert time was tripled in > practice before you got around to phy_hw_init() (which if I understand > is when things start reporting no ACK above). > > I am not sure what devicetree upstream would be the one to look at for > your beaglebone, but microchip's datasheet for the LAN8720A has > "TABLE 5-8: POWER-ON NRST & ..." section detailing some reset requirements: > > https://ww1.microchip.com/downloads/en/devicedoc/00002165b.pdf > > If I read it right, assert time needs to be >= 100 us, and > deassert... is not so clear to me unfortunately. Maybe for starters > triple your value and see if things work ok (just based on the 3 > repeated deasserts going down to 1 with the patch applied)? Hopefully > longer term the actual deassert timing can be confirmed. I went all in and did a 100ms delay before returning from the resets of 3 and 4 you mention. Sure enough, everything worked! It certainly should be understood and optimized. I added the linux-omap list to this thread (please let me know if there were others I should've CC'd on any of these emails). Either way, thank you both for helping me understand this! I hope to be able to fix the issue, but at the very least I hope it is considered "reported". Colin Foster