Re: [PATCH] driver core: Disable late probes by default

Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> · Wed, 21 Oct 2015 14:01:52 -0700

On Wed, Oct 21, 2015 at 03:53:31PM -0500, Rob Herring wrote:
> On Wed, Oct 21, 2015 at 1:45 PM, Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > On Wed, Oct 21, 2015 at 01:09:55PM -0500, Rob Herring wrote:
> >> On Wed, Oct 21, 2015 at 11:06 AM, Greg Kroah-Hartman
> >> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >> > On Wed, Oct 21, 2015 at 05:53:13PM +0200, Tomeu Vizoso wrote:
> >> >> On 21 October 2015 at 17:14, Greg Kroah-Hartman
> >> >> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >> >> > On Wed, Oct 21, 2015 at 04:35:58PM +0200, Tomeu Vizoso wrote:
> >> >> >> On 21 October 2015 at 05:39, Greg Kroah-Hartman
> >> >> >> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >> >> >> > On Tue, Oct 20, 2015 at 06:17:39PM +0200, Tomeu Vizoso wrote:
> >> >> >> >> On 20 October 2015 at 16:05, Greg Kroah-Hartman
> >> >> >> >> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> [...]
> 
> >> >> >> >> Because of that, ChromeOS had to use their own bindings for the panel
> >> >> >> >> node so that the panel probe wouldn't be deferred, introducing a
> >> >> >> >> sizable delta that is a barrier to rebasing on newer mainline releases
> >> >> >> >> and for vendors to upstream their HW adaptation for chrome devices.
> >> >> >> >
> >> >> >> > 1.5 second delay is crazy (again, my laptop boots to X in less time than
> >> >> >> > that),
> >> >> >>
> >> >> >> 1.5 seconds isn't crazy at all for the kernel to initialize all the
> >> >> >> devices in an embedded board. That's the current state of affairs
> >> >> >> today.
> >> >> >
> >> >> > Then someone needs to fix that, that really is crazy.  What takes so
> >> >> > long here?  Why aren't you using async probing to do things in parallel
> >> >> > when you need to sleep in device probe (I'm hoping you are sleeping in
> >> >> > device probe, otherwise that's really broken)?
> >> >>
> >> >> I'm a bit surprised now. During all the time that I have been pushing
> >> >> this forward I have been regularly testing on more than a dozen boards
> >> >> with different socs and 1.5 seconds to probe all the devices isn't
> >> >> that much. This is basically due to having to wait for the hardware a
> >> >> bit here and there, and to the sheer number of devices involved.
> >> >>
> >> >> Of course people have been looking at speeding up boot on ARM devices
> >> >> for years now and this is what we have come with up to now.
> >> >>
> >> >> > Have you used the tools we have to find where the time is being spent?
> >> >>
> >> >> Have to recognize that my starting point has been that probe order was
> >> >> the cause of the problem and haven't profiled the whole boot process,
> >> >> but I don't see how probe ordering would become irrelevant unless we
> >> >> got total probing time down to 200ms. And that would give us a
> >> >> fabulously fast boot, which I don't think is as realistic as you seem
> >> >> to believe.
> >> >
> >> > So you aren't using the tools that we have today that were created years
> >> > ago, to help to reduce boot time problems like this and instead work on
> >> > changing the driver core to try to guess at what the real issue is here?
> >> >
> >> > Come on, until you really know where you are taking so long, how can you
> >> > know what you need to fix?  I strongly recommend doing that here first,
> >> > that's why those tools were written in the first place.
> >>
> >> For something everyone is or should be doing for years, there is
> >> surprisingly zero information I can find. It is perf timechart you are
> >> talking about, right? Everything I find on it is all after userspace
> >> starts. I know perf has command line options, but I never could get it
> >> to do what I wanted (which was dumping events up until a boot hang).
> >
> > scripts/bootgraph.pl combined with 'initcall_debug' on the kernel
> > command line.  Landed in Linus's tree back in 2008, perf is not needed.
> 
> Okay, well that I have used. I was thinking something more granular
> than that. It will get you driver probe times, but the problem could
> be some underlying dependency causing the actual problem. For example,
> a driver enabling its regulator which happens to be connected via a
> bit-banged I2C bus. Obviously we can dig down from there, but it is
> not as simple as enabling the tool, running it, and instantly
> identifying the problem.

But it will show you the time taken in the driver probe calls, which is
what you need to know.  And it will show what order things ended up
happening in, which is also what you need to see (i.e. the regulator
that came after the i2c controller.)

Yeah, it's not a "here's where the bug is, go fix it" type tool, but it
kind of is given that you can see where your time is spent, which is the
most important thing to learn, so you can find what to fix.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html