Re: [PATCH v1] of: platform: Batch fwnode parsing in the init_machine() path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 2, 2020 at 5:14 PM Laurent Pinchart
<laurent.pinchart@xxxxxxxxxxxxxxxx> wrote:
>
> Hi Saravana,
>
> On Fri, Oct 02, 2020 at 12:56:30PM -0700, Saravana Kannan wrote:
> > On Fri, Oct 2, 2020 at 11:35 AM 'Grygorii Strashko' via kernel-team wrote:
> > > On 02/10/2020 21:27, Laurent Pinchart wrote:
> > > > On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote:
> > > >> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote:
> > > >>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote:
> > > >>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@xxxxxxxxxx> wrote:
> > > >>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > >>>>>>
> > > >>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> > > >>>>>> adding all top level devices") optimized the fwnode parsing when all top
> > > >>>>>> level devices are added, it missed out optimizing this for platform
> > > >>>>>> where the top level devices are added through the init_machine() path.
> > > >>>>>>
> > > >>>>>> This commit does the optimization for all paths by simply moving the
> > > >>>>>> fw_devlink_pause/resume() inside of_platform_default_populate().
> > > >>>>>>
> > > >>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@xxxxxx>
> > > >>>>>> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > > >>>>>> ---
> > > >>>>>>   drivers/of/platform.c | 19 +++++++++++++++----
> > > >>>>>>   1 file changed, 15 insertions(+), 4 deletions(-)
> > > >>>>>>
> > > >>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > >>>>>> index 071f04da32c8..79972e49b539 100644
> > > >>>>>> --- a/drivers/of/platform.c
> > > >>>>>> +++ b/drivers/of/platform.c
> > > >>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> > > >>>>>>                                   const struct of_dev_auxdata *lookup,
> > > >>>>>>                                   struct device *parent)
> > > >>>>>>   {
> > > >>>>>> -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> > > >>>>>> -                                   parent);
> > > >>>>>> +       int ret;
> > > >>>>>> +
> > > >>>>>> +       /*
> > > >>>>>> +        * fw_devlink_pause/resume() are only safe to be called around top
> > > >>>>>> +        * level device addition due to locking constraints.
> > > >>>>>> +        */
> > > >>>>>> +       if (!root)
> > > >>>>>> +               fw_devlink_pause();
> > > >>>>>> +
> > > >>>>>> +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> > > >>>>>> +                                  parent);
> > > >>>>>
> > > >>>>> of_platform_default_populate() vs. of_platform_populate() is just a
> > > >>>>> different match table. I don't think the behavior should otherwise be
> > > >>>>> different.
> > > >>>>>
> > > >>>>> There's also of_platform_probe() which has slightly different matching
> > > >>>>> behavior. It should not behave differently either with respect to
> > > >>>>> devlinks.
> > > >>>>
> > > >>>> So I'm trying to do this only when the top level devices are added for
> > > >>>> the first time. of_platform_default_populate() seems to be the most
> > > >>>> common path. For other cases, I think we just need to call
> > > >>>> fw_devlink_pause/resume() wherever the top level devices are added for
> > > >>>> the first time. As I said in the other email, we can't add
> > > >>>> fw_devlink_pause/resume() by default to of_platform_populate().
> > > >>>>
> > > >>>> Do you have other ideas for achieving "call fw_devlink_pause/resume()
> > > >>>> only when top level devices are added for the first time"?
> > > >>>
> > > >>> I'm not an expert in this domain, but before investigating it, would you
> > > >>> be able to share a hack patch that implements this (in the most simple
> > > >>> way) to check if it actually fixes the delays I experience on my system
> > > >>> ?
> > > >>
> > > >> So I take it the patch I sent out didn't work for you? Can you tell me
> > > >> what machine/DT you are using?
> > > >
> > > > I've replied to the patch:
> > > >
> > > > Based on v5.9-rc5, before the patch:
> > > >
> > > > [    0.652887] cpuidle: using governor menu
> > > > [   12.349476] No ATAGs?
> > > >
> > > > After the patch:
> > > >
> > > > [    0.650460] cpuidle: using governor menu
> > > > [   12.262101] No ATAGs?
> > > >
> > > > I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially
> > > > a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few
> > > > additional nodes for GPIO keys, LCD panel, backlight and touchscreen.
> > > >
> > >
> > > hope you are receiving my mails as I've provided you with all required information already [1]
> >
> > Laurent/Grygorii,
> >
> > Looks like I'm definitely missing emails. Sorry about the confusion.
> >
> > I have some other urgent things on my plate right now. Is it okay if I
> > get to this in a day or two? In the end, we'll find a solution that
> > addresses most/all of the delay.
>
> No issue on my side.

Hi Laurent,

Sorry it took awhile for me to get back to this.

Can you try throwing around fw_devlink_pause/resume() around the
of_platform_populate() call in arch/arm/mach-omap2/pdata-quirks.c?
Just trying to verify the cause/fix.

If it fixes the issue, then considering Rob's comments [1], a good
short term solution might be to have the suggestion above and some way
to do pause/resume only when the top level devices are added.

> By the way, during initial investigations, I've traced code paths to
> figure out if there was a particular step that would consume a large
> amount of time, and found out that of_platform_populate() ends up
> executing devlink-related code that seems to have an O(n^3) complexity
> on the number of devices, with a few dozens of milliseconds for each
> iteration. That's a very bad complexity.

As you said, the complexity of fw_devlink parsing can be O(N^2). There
are other ways to improve it to make it O(N) but it has a bunch of
additional complexity and memory increase. When I tried to do it that
way the first time, I was question whether O(N^2) actually translated
to measurable difference. Looks like we do now :) I have something in
mind for how to do it with O(N) complexity, but I expect it to take a
while to get in. So in the meantime, I'm thinking of using
fw_devlink_pause/resume() as a short term optimization.

-Saravana

[1] - https://lore.kernel.org/linux-omap/CAL_Jsq+6mxtFei3+1ic4c5XCftJ8nZK6_S5_d15yEXQ02BTNKw@xxxxxxxxxxxxxx/



[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux