Re: [Freedreno] [PATCH 0/9] drm/msm: Avoid possible infinite probe deferral and speed booting

Rob Herring <robh+dt@xxxxxxxxxx> · Tue, 14 Jul 2020 16:13:02 -0600

On Tue, Jul 14, 2020 at 10:33 AM Jeffrey Hugo <jeffrey.l.hugo@xxxxxxxxx> wrote:
>
> On Mon, Jul 13, 2020 at 5:50 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Mon, Jul 13, 2020 at 1:25 PM Rob Herring <robh+dt@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Jul 13, 2020 at 9:08 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Mon, Jul 13, 2020 at 7:11 AM Rob Herring <robh+dt@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Jul 10, 2020 at 5:02 PM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > I found that if I ever had a little mistake in my kernel config,
> > > > > > or device tree, or graphics driver that my system would sit in a loop
> > > > > > at bootup trying again and again and again.  An example log was:
> > > > >
> > > > > Why do we care about optimizing the error case?
> > > >
> > > > It actually results in a _fully_ infinite loop.  That is: if anything
> > > > small causes a component of DRM to fail to probe then the whole system
> > > > doesn't boot because it just loops trying to probe over and over
> > > > again.  The messages I put in the commit message are printed over and
> > > > over and over again.
> > >
> > > Sounds like a bug as that's not what should happen.
> > >
> > > If you defer during boot (initcalls), then you'll be on the deferred
> > > list until late_initcall and everything is retried. After
> > > late_initcall, only devices getting added should trigger probing. But
> > > maybe the adding and then removing a device is causing a re-trigger.
> >
> > Right, I'm nearly certain that the adding and then removing is causing
> > a re-trigger.  I believe the loop would happen for any case where we
> > have a probe function that:
> >
> > 1. Adds devices.
> > 2. After adding devices it decides that it needs to defer.
> > 3. Removes the devices it added.
> > 4. Return -EPROBE_DEFER from its probe function.
> >
> > Specifically from what I know about how -EPROBE_DEFER works I'm not
> > sure how it wouldn't cause an infinite loop in that case.
> >
> > Perhaps the missing part of my explanation, though, is why it never
> > gets out of this infinite loop.  In my case I purposely made the
> > bridge chip "ti-sn65dsi86.c" return an error (-EINVAL) in its probe
> > every time.  Obviously I wasn't going to get a display up like this,
> > but I just wanted to not loop forever at bootup.  I tracked down
> > exactly why we get an - EPROBE_DEFER over and over in this case.
> >
> > You can see it in msm_dsi_host_register().  If some components haven't
> > shown up when that function runs it will _always_ return
> > -EPROBE_DEFER.
> >
> > In my case, since I caused the bridge to fail to probe, those
> > components will _never_ show up.  That means that
> > msm_dsi_host_register() will _always_ return -EPROBE_DEFER.
> >
> > I haven't dug through all the DRM code enough, but it doesn't
> > necessarily seem like the wrong behavior.  If the bridge driver or a
> > panel was a module then (presumably) they could show up later and so
> > it should be OK for it to defer, right?
> >
> > So with all that, it doesn't really feel like this is a bug so much as
> > it's an unsupported use case.  The current deferral logic simply can't
> > handle the case we're throwing at it.  You cannot return -EPROBE_DEFER
> > if your probe function adds devices each time through the probe
> > function.
> >
> > Assuming all the above makes sense, that means we're stuck with:
> >
> > a) This patch series, which makes us not add devices.
> >
> > b) Some other patch series which rearchitects the MSM graphics stack
> > to not return -EPROBE_DEFER in this case.
>
> This isn't a MSM specific issue.  This is an issue with how the DSI
> interface works, and how software is structured in Linux.  I would
> expect that pretty much any DSI host in the kernel would have some
> version of this issue.
>
> The problem is that DSI is not "hot pluggable", so to give the DRM
> stack the info it needs, we need both the DSI controller (aka the MSM
> graphics stack in your case), and the thing it connects to (in your
> case, the TI bridge, normally the actual panel) because the DRM stack
> expects that if init completes, it has certain information
> (resolution, etc), and some of that information is in the DSI
> controller, and some of it is on the DSI device.

Ah yes, DRM's lack of hot-plug and discrete component support... Is
that not improved with some of the bridge rework?

Anyways, given there is a child dependency on the parent, I don't
think we should work-around DRM deficiencies in DT.

BTW, There's also a deferred probe timeout you can use which stops
deferring probe some number of seconds after late_initcall.

Rob