On Tue, Aug 31, 2021 at 6:16 AM Andrew Lunn <andrew@xxxxxxx> wrote: > > > > I must admit, my main problem at the moment is -rc1 in two weeks > > > time. It seems like a number of board with Ethernet switches will be > > > broken, that worked before. phy-handle is not limited to switch > > > drivers, it is also used for Ethernet drivers. So it could be, a > > > number of Ethernet drivers are also going to be broken in -rc1? > > > > Again, in those cases, based on your FEC example, fw_devlink=on > > actually improves things. > > Debatable. I did some testing. As expected some boards with Ethernet > switches are now broken. To clarify myself, I'm saying the patch to parse "ethernet" [8] will make things better with fw_devlink=on for FEC. Not sure if you tested that patch or not. And yes, "phy-handle" will make things worse for switches because it has two issues that need to be fixed. One is this deferred probe thing for which I gave a patch in the previous email and the other is what Marek reported (fix in the works). So can you revert "phy-handle" support and test with [8] and you should see things be better with fw_devlink=on. [8] - https://lore.kernel.org/netdev/CAGETcx9=AyEfjX_-adgRuX=8a0MkLnj8sy2KJGhxpNCinJu4yA@xxxxxxxxxxxxxx/ > Without fw_devlink=on, some boards are not > optimal, but they actually work. With it, they are broken. > > I did a bisect, and they have been broken since: > > ea718c699055c8566eb64432388a04974c43b2ea is the first bad commit > commit ea718c699055c8566eb64432388a04974c43b2ea > Author: Saravana Kannan <saravanak@xxxxxxxxxx> > Date: Tue Mar 2 13:11:32 2021 -0800 > > Revert "Revert "driver core: Set fw_devlink=on by default"" > > This reverts commit 3e4c982f1ce75faf5314477b8da296d2d00919df. > > Since all reported issues due to fw_devlink=on should be addressed by > this series, revert the revert. fw_devlink=on Take II. > > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx> > Link: https://lore.kernel.org/r/20210302211133.2244281-4-saravanak@xxxxxxxxxx > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > drivers/base/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > So however it is fixed, it needs to go into stable, not just -rc1. Not sure what was the tip of the tree with which you bisected. For example, if phy-handle broke things, bisect could still point at this patch above depending on whether the first bisect is good/bad. Because reverting this patch effectively disabled phy-handle parsing too. To be clear, I'm not saying things aren't broken. I'm just pointing out some nuances with the bisect that we need to be aware of. > > Again, it's not a widespread problem as I explained before. > > fw_devlink=on has been the default for 2 kernel versions now. With no > > unfixed reported issues. > > Given that some Ethernet switches have been broken all that time, i > wonder what else has been broken? Normally, the kernel which is > release in December becomes the next LTS. It then gets picked up by > the distros and more wide spread tested. So it could be, you get a > flood of reports in January and February about things which are > broken. This is why i don't think you should be relying on bug > reports, you should be taking a more proactive stance and trying to > analyse the DTB blobs. As I mentioned earlier, just looking at DTB doesn't tell me much because the driver could still be fine depending on how it's written. Also, I don't have an easy way to do this. If I find a way, I'll do it. Btw, most bug reports that have been raised have been fixed with generic fixes that address general DT patterns. For example, fw_devlink has cycle detection built in, has support for devices that never get probed, etc. Enabling fw_devlink=on and handling bug reports with generic fixes has worked well so far to get fw_devlink to where it is today. I've tried to be very quick in responding to fw_devlink issues -- and that has worked out so far and hopefully it'll continue to work out. > I will spend some time trying out your proposed fix. See if they are a > quick fix for stable. Thanks for testing it out. And worst case, we'll revert phy-handle support. -Saravana