Quoting Daniel Vetter (2022-02-08 04:53:59) > On Mon, Jan 31, 2022 at 05:34:26PM +0100, Greg Kroah-Hartman wrote: > > On Mon, Jan 31, 2022 at 04:15:09PM +0100, Daniel Vetter wrote: > > > On Mon, Jan 31, 2022 at 2:48 PM Greg Kroah-Hartman > > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > On Thu, Jan 27, 2022 at 12:01:08PM -0800, Stephen Boyd wrote: > > > > > The component framework only provides 'bind' and 'unbind' callbacks to > > > > > tell the host driver that it is time to assemble the aggregate driver > > > > > now that all the components have probed. The component framework doesn't > > > > > attempt to resolve runtime PM or suspend/resume ordering, and explicitly > > > > > mentions this in the code. This lack of support leads to some pretty > > > > > gnarly usages of the 'prepare' and 'complete' power management hooks in > > > > > drivers that host the aggregate device, and it fully breaks down when > > > > > faced with ordering shutdown between the various components, the > > > > > aggregate driver, and the host driver that registers the whole thing. > > > > > > > > > > In a concrete example, the MSM display driver at drivers/gpu/drm/msm is > > > > > using 'prepare' and 'complete' to call the drm helpers > > > > > drm_mode_config_helper_suspend() and drm_mode_config_helper_resume() > > > > > respectively, so that it can move the aggregate driver suspend/resume > > > > > callbacks to be before and after the components that make up the drm > > > > > device call any suspend/resume hooks they have. This only works as long > > > > > as the component devices don't do anything in their own 'prepare' and > > > > > 'complete' callbacks. If they did, then the ordering would be incorrect > > > > > and we would be doing something in the component drivers before the > > > > > aggregate driver could do anything. Yuck! > > > > > > > > > > Similarly, when trying to add shutdown support to the MSM driver we run > > > > > across a problem where we're trying to shutdown the drm device via > > > > > drm_atomic_helper_shutdown(), but some of the devices in the encoder > > > > > chain have already been shutdown. This time, the component devices > > > > > aren't the problem (although they could be if they did anything in their > > > > > shutdown callbacks), but there's a DSI to eDP bridge in the encoder > > > > > chain that has already been shutdown before the driver hosting the > > > > > aggregate device runs shutdown. The ordering of driver probe is like > > > > > this: > > > > > > > > > > 1. msm_pdev_probe() (host driver) > > > > > 2. DSI bridge > > > > > 3. aggregate bind > > > > > > > > > > When it comes to shutdown we have this order: > > > > > > > > > > 1. DSI bridge > > > > > 2. msm_pdev_shutdown() (host driver) > > > > > > > > > > and so the bridge is already off, but we want to communicate to it to > > > > > turn things off on the display during msm_pdev_shutdown(). Double yuck! > > > > > Unfortunately, this time we can't split shutdown into multiple phases > > > > > and swap msm_pdev_shutdown() with the DSI bridge. > > > > > > > > > > Let's make the component_master_ops into an actual device driver that has > > > > > probe/remove/shutdown functions. The driver will only be bound to the > > > > > aggregate device once all component drivers have called component_add() > > > > > to indicate they're ready to assemble the aggregate driver. This allows > > > > > us to attach shutdown logic (and in the future runtime PM logic) to the > > > > > aggregate driver so that it runs the hooks in the correct order. > > > > > > > > I know I asked before, but I can not remember the answer. > > > > > > > > This really looks like it is turning into the aux bus code. Why can't > > > > you just use that instead here for this type of thing? You are creating > > > > another bus and drivers for that bus that are "fake" which is great, but > > > > that's what the aux bus code was supposed to help out with, so we > > > > wouldn't have to write more of these. > > > > > > > > So, if this really is different, can you document it here so I remember > > > > next time you resend this patch series? > > > > > > aux takes a device and splits it into a lot of sub-devices, each with > > > their own driver. > > > > > > This takes a pile of devices, and turns it into a single logical > > > device with a single driver. > > > > > > So aux is 1:N, component is N:1. > > > > > > And yes you asked this already, I typed this up already :-) > > > > Ok, thanks. But then why is a bus needed if there's a single driver? > > I guess a bus for that driver? So one bus, one driver, and one device? > > Maybe? I have honestly no idea how this should be best modelled in the > linux device model. There can be one driver and multiple aggregate devices attached to that driver. This happens for the MediaTek SMMU (IOMMU) code that has two aggregate devices. We need a bus to have a driver and attach power management operations and a shutdown hook to that driver that knows about the entire graphics card/encoder chain. Otherwise there's not a good place to insert the function call that walks the display hardware and shuts down devices, drm_atomic_helper_shutdown(). We have a problem where an i2c device for a display bridge can't be turned off because we've already shut down the whole i2c bus before we call drm_atomic_helper_shutdown() due to the platform device that calls it probing far before the i2c bridge probes. Could we attach a shutdown hook and dev_pm_ops to the drm class structure and then have some DRM API that lets us opt into using the simple shutdown helper? That would avoid making yet another bus and driver as my high level understanding of 'struct class drm_class' is that it represents the graphics card and it isn't created until the entire display pipeline devices have probed and checked in with the component layer. > > > I think we need better documentation here... > > https://dri.freedesktop.org/docs/drm/driver-api/component.html?highlight=component_del#component-helper-for-aggregate-drivers > > There's a kerneldoc overview for component, but it's for driver authors > that want to use component to glue different hw pieces into a logical > driver, so it skips over these internals. > > And I'm honestly not sure how we want to leak implementation internals > like the bus/driver/device structure ot users of component.c. What are the next steps here? Do I need to document the component code further in kernel-doc? I can add kernel-doc for the things like component_match_array and aggregate_device structure and highlight how it is different from the aux bus.