On Thu, Oct 24, 2024 at 3:55 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > On Wed, Oct 23, 2024 at 1:52 AM Tomi Valkeinen > <tomi.valkeinen@xxxxxxxxxxxxxxxx> wrote: > > > > Hi, > > > > On 22/10/2024 19:07, Saravana Kannan wrote: > > > On Tue, Oct 22, 2024 at 12:51 AM Tomi Valkeinen > > > <tomi.valkeinen@xxxxxxxxxxxxxxxx> wrote: > > >> > > >> Hi, > > >> > > >> On 22/10/2024 02:29, Saravana Kannan wrote: > > >>> Hi Tomi, > > >>> > > >>> Sorry it took a while to get back. > > >>> > > >>> On Mon, Sep 16, 2024 at 4:52 AM Tomi Valkeinen > > >>> <tomi.valkeinen@xxxxxxxxxxxxxxxx> wrote: > > >>>> > > >>>> Hi, > > >>>> > > >>>> We have an issue where two devices have dependencies to each other, > > >>>> according to drivers/base/core.c's fw_devlinks, and this prevents them > > >>>> from probing. I've been adding debugging to the core.c, but so far I > > >>>> don't quite grasp the issue, so I thought to ask. Maybe someone can > > >>>> instantly say that this just won't work... > > >>>> > > >>>> So, we have two devices, DSS (display subsystem) and an LVDS panel. The > > >>>> DSS normally outputs parallel video from its video ports (VP), but it > > >>>> has an integrated LVDS block (OLDI, Open LVDS Display Interface). The > > >>>> OLDI block takes input from DSS's parallel outputs. The OLDI is not > > >>>> modeled as a separate device (neither in the DT nor in the Linux device > > >>>> model) as it has no register space, and is controlled fully by the DSS. > > >>>> > > >>>> To support dual-link LVDS, the DSS has two OLDI instances. They both > > >>>> take their input from the same parallel video port, but each OLDI sends > > >>>> alternate lines forward. So for a dual-link setup the connections would > > >>>> be like this: > > >>>> > > >>>> +-----+-----+ +-------+ +----------+ > > >>>> | | | | | | | > > >>>> | | VP1 +----+--->| OLDI0 +-------->| | > > >>>> | | | | | | | | > > >>>> | DSS +-----+ | +-------+ | Panel | > > >>>> | | | | | | | | > > >>>> | | VP2 | +--->| OLDI1 +-------->| | > > >>>> | | | | | | | > > >>>> +-----+-----+ +-------+ +----------+ > > >>>> > > >>>> As the OLDI is not a separate device, it also does not have an > > >>>> independent device tree node, but rather it's inside DSS's node. The DSS > > >>>> parallel outputs are under a normal "ports" node, but OLDI ports are > > >>>> under "oldi-txes/ports" (see below for dts to clarify this). > > >>>> > > >>>> And I think (guess...) this is the root of the issue we're seeing, as it > > >>>> means the following, one or both of which might be the reason for this > > >>>> issue: > > >>>> > > >>>> - OLDI fwnodes don't have an associated struct device *. I think the > > >>>> reason is that the OLDI media graph ports are one level too deep in the > > >>>> hierarchy. So while the DSS ports are associated with the DSS device, > > >>>> OLDI ports are not. > > >>> > > >>> This is the root cause of the issue in some sense. fw_devlink doesn't > > >>> know that DSS depends on the VP. In the current DT, it only appears as > > >>> if the OLDI depends on VP. See further below for the fix. > > >>> > > >>>> > > >>>> - The VP ports inside the DSS point to OLDI ports, which are also inside > > >>>> DSS. So ports from a device point to ports in the same device (and back). > > >>>> > > >>>> If I understand the fw_devlink code correctly, in a normal case the > > >>>> links formed with media graphs are marked as a cycle > > >>>> (FWLINK_FLAG_CYCLE), and then ignored as far as probing goes. > > >>>> > > >>>> What we see here is that when using a single-link OLDI panel, the panel > > >>>> driver's probe never gets called, as it depends on the OLDI, and the > > >>>> link between the panel and the OLDI is not a cycle. > > >>>> > > >>>> The DSS driver probes, but the probe fails as it requires all the panel > > >>>> devices to have been probed (and thus registered to the DRM framework) > > >>>> before it can finish its setup. > > >>>> > > >>>> With dual-link, probing does happen and the drivers work. But I believe > > >>>> this is essentially an accident, in the sense that the first link > > >>>> between the panel and the OLDI still blocks the probing, but the second > > >>>> links allows the driver core to traverse the devlinks further, causing > > >>>> it to mark the links to the panel as FWLINK_FLAG_CYCLE (or maybe it only > > >>>> marks one of those links, and that's enough). > > >>>> > > >>>> If I set fw_devlink=off as a kernel parameter, the probing proceeds > > >>>> successfully in both single- and dual-link cases. > > >>>> > > >>>> Now, my questions is, is this a bug in the driver core, a bug in the DT > > >>>> bindings, or something in between (DT is fine-ish, but the structure is > > >>>> something that won't be supported by the driver core). > > >>>> > > >>>> And a follow-up question, regardless of the answer to the first one: > > >>>> which direction should I go from here =). > > >>>> > > >>>> The device tree data (simplified) for this is as follows, first the > > >>>> dual-link case, then the single-link case: > > >>>> > > >>>> /* Dual-link */ > > >>>> > > >>>> dss: dss@30200000 { > > >>>> compatible = "ti,am625-dss"; > > >>>> > > >>>> oldi-txes { > > >>>> oldi0: oldi@0 { > > >>>> oldi0_ports: ports { > > >>>> port@0 { > > >>>> oldi_0_in: endpoint { > > >>>> remote-endpoint = <&dpi0_out0>; > > >>>> }; > > >>>> }; > > >>>> > > >>>> port@1 { > > >>>> oldi_0_out: endpoint { > > >>>> remote-endpoint = <&lcd_in0>; > > >>>> }; > > >>>> }; > > >>>> }; > > >>>> }; > > >>>> > > >>>> oldi1: oldi@1 { > > >>>> oldi1_ports: ports { > > >>>> port@0 { > > >>>> oldi_1_in: endpoint { > > >>>> remote-endpoint = <&dpi0_out1>; > > >>>> }; > > >>>> }; > > >>>> > > >>>> port@1 { > > >>>> oldi_1_out: endpoint { > > >>>> remote-endpoint = <&lcd_in1>; > > >>>> }; > > >>>> }; > > >>>> }; > > >>>> }; > > >>>> }; > > >>>> > > >>>> dss_ports: ports { > > >>>> port@0 { > > >>>> dpi0_out0: endpoint@0 { > > >>>> remote-endpoint = <&oldi_0_in>; > > >>>> }; > > >>>> dpi0_out1: endpoint@1 { > > >>>> remote-endpoint = <&oldi_1_in>; > > >>>> }; > > >>>> }; > > >>>> }; > > >>>> }; > > >>>> > > >>>> display { > > >>>> compatible = "microtips,mf-101hiebcaf0", "panel-simple"; > > >>> > > >>> In here, add this new property that I added some time back. > > >>> > > >>> post-init-providers = <&oldi-txes>; > > >> > > >> Thanks! This helps: > > >> > > >> post-init-providers = <&oldi0>; > > >> > > >> or for dual-link: > > >> > > >> post-init-providers = <&oldi0>, <&oldi1>; > > >> > > >>> This tells fw_devlink that VP doesn't depend on this node for > > >>> initialization/probing. This property is basically available to break > > >>> cycles in DT and mark one of the edges of the cycles as "not a real > > >>> init dependency". > > >>> > > >>> You should do the same for the single link case too. > > >> > > >> While this helps, it's not very nice... Every new DT overlay that uses > > >> OLDI display needs to have these. > > > > > > Actually, taking a closer look at the DT and assuming I am visualizing > > > it correctly in my head, fw_devlink should notice the cycle between > > > oldi-txes and display and shouldn't block display from probing. Can > > > you check the cycle detection code and see where it's bailing out > > > early and not marking the fwnode link with the CYCLE flag? > > > > > > __fw_devlink_relax_cycles() is where you want to look. There are a > > > bunch of debug log messages inside it and around where it's called > > > from. > > > > I'm not quite sure how to read the debug messages. I have attached three > > kernel logs, with the debug prints enabled in drivers/base/core.c. The > > "fixed" is the one with the post-init-providers. > > > > I load the display drivers as modules after the main boot has happened, > > and at the end of the logs I have the kernel prints when I load the > > modules. The single-link.txt also shows the debugfs/devices_deferred file. > > > > The relevant strings to search are "dss", "oldi" and "display" (display > > is the panel). > > > > So... All devlinks are supplier-consumer links. How are those created > > with an OF media graph, as there's no clear supplier nor consumer. In > > this particular case I see that display is marked as a consumer of oldi, > > but also dss is marked as a consumer of oldi. Is this just, essentially, > > random? > > No, the cyclic links you see are "sync state only" links. They don't > enforce any ordering other than "sync_state()" callbacks. So, it's not > random. In this example, it just ensures that the sync_state() > callbacks of display and dss will only get called after both those > devices probe. If it's confusing, try to understand what > fw_devlink=permissive does. When we see a cycle, we just put all the > devices in the cycle in "permissive" mode wrt each other. > > > Also, as there's no separate device for OLDI, I don't see oldi at all in > > /sys/class/devlink/. But what I see there is a bit odd... > > > > For dual link I get: > > > > platform:display--platform:30200000.dss > > > > which, I guess, makes sense. But for single link fixed case, I don't > > have anything there... > > > > > Also, can you check debugfs/devices_deferred, or the > > > "wait_for_supplier" file under /sys/devices/..../display/ to make sure > > > it's actually stuck on oldi-txes? Just to make cure it's not some > > > other corner case that's triggered by oldi-txes? > > > > > >> I'm still confused about why this is needed. OF graphs are _always_ > > >> two-way links. Doesn't that mean that OF graphs never can be used for > > >> dependencies, as they go both ways? > > > > > > Good question :) Yes, they'll always be ignored as cycles. But with > > > post-init-providers, it's actually better to add it so that cycles are > > > broken and better ordering is enforced. See my talk at LPC referee > > > talk about fw_devlink to see all the benefits you get from it. :) > > > > Thanks for the pointer! It was interesting and I now understand the > > whole devlink thing better, although I have to say the details still > > escape me... =) > > > > Also, isn't post-init-providers describing software behavior, not > > hardware? It doesn't sound like something we should have in the DT. > > Not really. In real hardware, there can't be a cycle for > initialization. And the current DT properties don't tell us which link > is not needed for initialization. And post-init-providers is for > describing which dependency is not needed for hardware initialization. > > Take a look at the docs in the dt-schema for post-init-providers and > that might help. > > > >> If so, shouldn't we just always > > >> ignore all OF graphs for dependency checking? > > > > > > There are cases when two devices A and B have remote-endpoints between > > > them and ALSO have for example a gpio dependency between them. Where > > > the gpio is the "post-init-supplier". If we don't parse > > > remote-endpoint and mark the cycles, cases like these don't get to > > > probe. > > > > I'm sorry, I don't understand this. If we have 1) A and B with a (one > > way) gpio dependency, and 2) A and B with a (one way) gpio dependency > > _and_ two way media graph dependency, shouldn't the cases behave > > identically, as the graph dependency should just be ignored? > > Let's assume in both cases the A and B point to each other using > remote-endpoints. Let's also assume in both cases A says it needs a > GPIO from B. But the real dependency for probing is that B needs A to > probe first (due to the remote end point). > > The only difference between case 1 and 2 is whether fw_devlink > supports remote-endpoint parsing. > > In case 1, fw_devlink has no way of knowing there is a cycle between A > and B because it doesn't know that B depends on A (due to > remote-endpoint). So it can't break any cycles and permanently > prevents A and B from probing because all it sees is that it needs B > to provide a GPIO to A. > > In case 2, fw_devlink sees that B also depends on A. So there's a > cycle and marks them both as part of a cycle. And now A and B can > probe. > > Also, forgot to say this last time: we need to support remote-endpoint > to ensure sync_state() callbacks work correctly when X and Y have > remote-endpoints between them. > > > Or maybe I don't understand the example case at all... Why would the > > gpio be a post-init-supplier? Isn't gpio something you want at init time? > > Apparently not. There are examples of GPIOs that are used only > acquired (get()) at run time in reaction to some event. > > Anyway, with all that said, I think I have a solution in mind for you > that should allow devices to probe without post-init-providers. But > I'll need to make the changes and then test it. Let me send it to you > in a few days. Btw, you should always use post-init-providers to break > cycles and do better enforcing of probe/suspend/resume/shutdown order. Hey Tomi, Can you give this a shot and let me know if it fixes your issue? I have more instruction in the patch too. https://lore.kernel.org/all/20241025223721.184998-1-saravanak@xxxxxxxxxx/T/#u -Saravana