On Mon, Feb 5, 2024 at 12:06 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > On Mon, Jan 22, 2024 at 5:46 PM Michael Pratt <mcpratt@xxxxx> wrote: > > > > Hi all, > > > > First off, if you are CC'ed here, it's likely because you were > > involved in or CC'ed in a patch series by Saravana Kannan for fw_devlink, > > and for consistency, I'm kindly asking for review from you all as well. > > If you think this may not affect your use cases, feel free to ignore this. > > I'm also CC'ing Christian and Rafal from the Openwrt project. > > > > This series is following up on some excellent work from Saravana [1] > > which is another patch series that includes some changes > > that helped make it possible for fw_devlink to work with MTD partitions > > that are a supplier fwnode by being a NVMEM provider. For an MTD partition > > to become an NVMEM provider, it must be registered as a platform device > > using of_platform_populate() or similar function > > which was done in a previous commit [2] > > but this also resulted in fw_devlink to apply > > to those fwnodes and their child fwnodes. > > > > That regression caused consumer devices to defer indefinitely > > while waiting for probing that will never happen or for device links > > to form from fwnode links with fwnodes that are not associated > > with any real device or driver that probes > > (e.g. describing the location of a MAC address). > > > > Saravana's patch series helped in two ways: > > First, by moving consumers from a child fwnode (in this case, > > the "nvmem-cells" compatible node) to an ancestor fwnode > > that represents the actual supplier device when that device probes, > > which handles the case where > > the supplier device has a probe attempt first. [3] [4] > > And secondly, by specifically marking "nvmem-cells" compatible nodes > > as populated during MTD partition parsing which also occurs during > > the supplier device probe, which handles both cases of initial probe order, > > but only for that node type. [5] > > > > However, there's some drawbacks to the second solution > > Oh, somehow missed this thread entirely until it saw some activity today. > > > from having to manually name which nodes need the workaround > > for the corner case with the compatible string. > > Most notably, that it's possible for this corner case > > to apply to other nodes types, but also the fact that initial probe order > > deciding whether or not everything probes in the intended order, if at all, > > through driver core alone is still an issue with fw_devlink, > > despite the fact that controlling probe order in driver core > > is one of it's goals. In other words, the real problem is in driver core, > > but the fix is in the mtd driver. > > It's been a while since I looked at MTD code, but the real problem is > not in driver core, but how it's used incorrectly by the MTD/nvmem > frameworks. Adding devices to a bus that'll never probe is wrong. I > think there's also a case of two devices being created off the same DT > node. While not technically wrong, it's weird when one of them never > probes. > > I'll take a closer look and respond to this series later. Hopefully > this week, but if not, then next week. > > As I said in the other patch, I don't like the series in the current > form because it's changing APIs in not so great ways. > > fwnode_link_add() is supposed to be super dumb and simple. It's > literally there just to avoid reparsing DT nodes multiple times. > Nothing more ever. It just denotes the "pointers" between fwnode or DT > nodes. > > The "smarts" should either be where fwnode links are converted into > device links (and not have to fix them up) or where nvmem-cells is > being parsed and converted to fwnode links. > > As I said in the other patch, let me take a closer look this week or > next and get back. These things needs several hours of uninterrupted > time for me to debug and unwind. As promised, I took a closer look and left some comments in Patch 1 and 2. Patch 1 is 100% broken/wrong so the series will not be accepted. Just for the sake of understanding, can you send a patch that'll add these additional compatible strings similar to how I handled nvmem-cells and show my how much worse it is than this series? > > > > > Unfortunately, with the Openwrt project > > we are experiencing this regression again > > by implementing the new NVMEM layout "fixed-layout" > > after it was accepted into the kernel. [6] > > This causes some subsystems of an SOC, like > > ethernet or wireless or both depending on hardware and DTS, > > and in some cases a completely different function like USB > > to never probe once again, and the temporary fix, like before, > > is by disabling fw_devlink. [7] > > > > Below is a simplified DTS of an Atheros device with the NVMEM layout: > > > > > > ð0 { > > nvmem-cells = <&macaddr_art_0>; > > nvmem-cell-names = "mac-address"; > > }; > > > > &wifi { > > nvmem-cells = <&caldata_art_1000>; > > nvmem-cell-names = "calibration"; > > }; > > > > &spi { > > status = "okay"; > > > > flash@0 { > > compatible = "jedec,spi-nor"; > > > > partitions { > > compatible = "fixed-partitions"; > > > > partition@ff0000 { > > label = "art"; > > reg = <0xff0000 0x010000>; > > read-only; > > > > nvmem-layout { > > compatible = "fixed-layout"; > > > > macaddr_art_0: macaddr@0 { > > reg = <0x0 0x6>; > > }; > > > > caldata_art_1000: caldata@1000 { > > reg = <0x1000 0x440>; > > }; > > }; > > }; > > }; > > }; > > }; In this example, can you walk me through the probe attempts/successes of the nvmem supplier and it's consumers and at what point does fw_devlink makes the wrong decision? You also said fw_devlink depends on the order in which devices probe to work correctly. This is definitely not the case/intention. So, if you can show me an example of that, I'll fix that too. I'm fairly certain this example will end up being a case of the nvmem framework creating pointless devices or using a "bus" when it needs a "class". But I don't want to assume. I'm asking all these question to understand your case better and maybe suggest a better fix that doesn't break fw_devlink or effectively disable it. Thanks, Saravana > > > > > > When the DTS is written this way, not only is there a different node > > involved for NVMEM, but this node is a new node that is yet another > > descendant in the tree. In other words, the "partition@ff0000" node > > used to be what designated this device as the NVMEM provider > > with the "nvmem-cells" compatible, so the node that represents > > the actual probing device is now 4 parent nodes up in the tree > > in this example instead of 3 parent nodes up in the tree as before. > > > > For the past year, and even before the "fixed-layout" issue came up, > > I had been experimenting with a way to handle these reverse probe order > > and linking of distant descendant node issues in a generic way instead of > > naming exceptions with the compatible string, and this series is the > > culmination of those efforts. It may look strange at first, > > but there are a myriad set of cases to handle and other reasons > > that led me to develop this approach of using existing bad device links > > in order to find the correct fwnode to link to, and then redo > > the relevant links for that consumer device altogether. > > I'm concerned that doing this another way would be a much more massive > > project that would basically rewrite what the fw_devlink feature does. > > Or perhaps there would have to be a new, third form of device links > > that would be a "placeholder" before it becomes a fwnode link. > > Eventually, I came to the conclusion that > > there simply is not enough information to form the correct fwnode link > > before the real supplier device has a successful probe. > > > > Some of the changes proposed here are made on the extreme side of caution, > > for example, checking for null dereference when it might not be necessary, > > and reducing the activity of some functions in order to reduce > > the amount of assumptions taking place in the middle of driver core > > in cases where the new functions proposed here handles that just as well > > and closer to a possible probe defer event > > (e.g. not declaring a fwnode as NOT_DEVICE before > > a probe attempt is expected to have happened). > > > > I have tried to make the commit messages as self-explanatory as I can, > > but they might have ended up a little too verbose, and therefore confusing > > but I'm ready to explain whatever has not been explained well already. > > Lastly, this is my first attempt at sending a larger change to the kernel, > > so I appreciate your time and patience very much. > > > > MCP > > > > > > [1] https://lore.kernel.org/lkml/20230207014207.1678715-1-saravanak@xxxxxxxxxx/ > > > > [2] bcdf0315a61a29eb753a607d3a85a4032de72d94 > > > > [3] 3a2dbc510c437ca392516b0105bad8e7970e6614 > > > > [4] 411c0d58ca6faa9bc4b9f5382118a31c7bb92a6f > > > > [5] fb42378dcc7f247df56f0ecddfdae85487495fbc > > > > [6] 27f699e578b1a72df89dfa3bc42e093a01dc8d10 > > > > [7] https://github.com/openwrt/openwrt/commit/703d667a0cdf6dfa402c08e72d0c77a257ca5009 > > > > > > Michael Pratt (4): > > driver core: fw_devlink: Use driver to determine probe ability > > driver core: fw_devlink: Link to supplier ancestor if no device > > driver core: fw_devlink: Add function device_links_fixup_suppliers() > > mtd: mtdpart: Allow fwnode links to NVMEM compatible fwnodes > > > > drivers/base/base.h | 1 + > > drivers/base/core.c | 144 ++++++++++++++++++++++++++++++++++++++--- > > drivers/base/dd.c | 2 + > > drivers/mtd/mtdpart.c | 10 --- > > include/linux/fwnode.h | 4 ++ > > 5 files changed, 143 insertions(+), 18 deletions(-) > > > > > > base-commit: b0d326da462e20285236e11e4cbc32085de9f363 > > -- > > 2.30.2 > > > >