On Thu, Jun 06, 2019 at 10:44:58AM +0200, Vincent Guittot wrote: > On Thu, 6 Jun 2019 at 10:34, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote: > > > > On 6/6/19 10:20 AM, Vincent Guittot wrote: > > > On Thu, 6 Jun 2019 at 09:49, Quentin Perret <quentin.perret@xxxxxxx> wrote: > > >> > > >> Hi Vincent, > > >> > > >> On Thursday 06 Jun 2019 at 09:05:16 (+0200), Vincent Guittot wrote: > > >>> Hi Quentin, > > >>> > > >>> On Wed, 5 Jun 2019 at 19:21, Quentin Perret <quentin.perret@xxxxxxx> wrote: > > >>>> > > >>>> On Friday 17 May 2019 at 14:55:19 (-0700), Stephen Boyd wrote: > > >>>>> Quoting Amit Kucheria (2019-05-16 04:54:45) > > >>>>>> (cc'ing Andy's correct email address) > > >>>>>> > > >>>>>> On Wed, May 15, 2019 at 2:46 AM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > >>>>>>> > > >>>>>>> Quoting Amit Kucheria (2019-05-13 04:54:12) > > >>>>>>>> On Mon, May 13, 2019 at 4:31 PM Amit Kucheria <amit.kucheria@xxxxxxxxxx> wrote: > > >>>>>>>>> > > >>>>>>>>> On Tue, Jan 15, 2019 at 12:13 AM Matthias Kaehlcke <mka@xxxxxxxxxxxx> wrote: > > >>>>>>>>>> > > >>>>>>>>>> The 8 CPU cores of the SDM845 are organized in two clusters of 4 big > > >>>>>>>>>> ("gold") and 4 little ("silver") cores. Add a cpu-map node to the DT > > >>>>>>>>>> that describes this topology. > > >>>>>>>>> > > >>>>>>>>> This is partly true. There are two groups of gold and silver cores, > > >>>>>>>>> but AFAICT they are in a single cluster, not two separate ones. SDM845 > > >>>>>>>>> is one of the early examples of ARM's Dynamiq architecture. > > >>>>>>>>> > > >>>>>>>>>> Signed-off-by: Matthias Kaehlcke <mka@xxxxxxxxxxxx> > > >>>>>>>>> > > >>>>>>>>> I noticed that this patch sneaked through for this merge window but > > >>>>>>>>> perhaps we can whip up a quick fix for -rc2? > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> And please find attached a patch to fix this up. Andy, since this > > >>>>>>>> hasn't landed yet (can we still squash this into the original patch?), > > >>>>>>>> I couldn't add a Fixes tag. > > >>>>>>>> > > >>>>>>> > > >>>>>>> I had the same concern. Thanks for catching this. I suspect this must > > >>>>>>> cause some problem for IPA given that it can't discern between the big > > >>>>>>> and little "power clusters"? > > >>>>>> > > >>>>>> Both EAS and IPA, I believe. It influences the scheduler's view of the > > >>>>>> the topology. > > >>>>> > > >>>>> And EAS and IPA are OK with the real topology? I'm just curious if > > >>>>> changing the topology to reflect reality will be a problem for those > > >>>>> two. > > >>>> > > >>>> FWIW, neither EAS nor IPA depends on this. Not the upstream version of > > >>>> EAS at least (which is used in recent Android kernels -- 4.19+). > > >>>> > > >>>> But doing this is still required for other things in the scheduler (the > > >>>> so-called 'capacity-awareness' code). So until we have a better > > >>>> solution, this patch is doing the right thing. > > >>> > > >>> I'm not sure to catch what you mean ? > > >>> Which so-called 'capacity-awareness' code are you speaking about ? and > > >>> what is the problem ? > > >> > > >> I'm talking about the wake-up path. ATM select_idle_sibling() is totally > > >> unaware of capacity differences. In its current form, this function > > >> basically assumes that all CPUs in a given sd_llc have the same > > >> capacity, which would be wrong if we had a single MC level for SDM845. > > >> So, until select_idle_sibling() is 'fixed' to be capacity-aware, we need > > >> two levels of sd for asymetric systems (including DynamIQ) so the > > >> wake_cap() story actually works. > > >> > > >> I hope that clarifies it :) > > > > > > hmm... does this justifies this wrong topology ? No, it doesn't. It relies heavily on how nested clusters are interpreted too, so it is quite fragile. > > > select_idle_sibling() is called only when system is overloaded and > > > scheduler disables the EAS path > > > In this case, the scheduler looks either for an idle cpu or for evenly > > > spreading the loads > > > This is maybe not always optimal and should probably be fixed but > > > doesn't justifies a wrong topology description IMHO > > > > The big/Little cluster detection in wake_cap() doesn't work anymore with > > DynamIQ w/o Phanton (DIE) domain. So the decision of going sis() or slow > > path is IMHO broken. > > That's probably not the right thread to discuss this further but i'm > not sure to understand why wake_cap() doesn't work as it compares the > capacity_orig of local cpu and prev cpu which are the same whatever > the sche domainœ We have had this discussion a couple of times over the last couple of years. The story, IIRC, is that when we introduced capacity awareness in the wake-up path (wake_cap()) we realised (I think it was actually you) that we could use select_idle_sibling() in cases where we know that the search space is limited to cpus with sufficient capacity so we didn't have to take the long route through find_idlest_cpu(). Back then, big and little were grouped by clusters so it was "safe" to use select_idle_sibling() on cpu or prev_cpu if they have sufficient capacity. With DynamiQ the true topology on many systems is just one cluster and hence using select_idle_sibling() there means search space includes all cpu types which isn't "safe" if you have a task requiring more capacity than can be offered by any cpu in the system. We need to use the find_idlest_cpu() path on more cases than we do today. All the code is there I think, we just have to tweak some conditions. I can try to come up with a simple fix we can discuss and refine as necessary. Morten