Re: [PATCH v4 10/14] cpuidle: psci: Prepare to use OS initiated suspend mode via PM domains

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Thu, 19 Dec 2019 22:33:34 +0100

On Thu, 19 Dec 2019 at 19:01, Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
>
> On Thu, Dec 19, 2019 at 04:48:13PM +0100, Ulf Hansson wrote:
> > On Thu, 19 Dec 2019 at 15:32, Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
> > >
> > > On Wed, Dec 11, 2019 at 04:43:39PM +0100, Ulf Hansson wrote:
> > > > The per CPU variable psci_power_state, contains an array of fixed values,
> > > > which reflects the corresponding arm,psci-suspend-param parsed from DT, for
> > > > each of the available CPU idle states.
> > > >
> > > > This isn't sufficient when using the hierarchical CPU topology in DT, in
> > > > combination with having PSCI OS initiated (OSI) mode enabled. More
> > > > precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
> > > > idle state the cluster (a group of CPUs) should enter, while in PSCI
> > > > Platform Coordinated (PC) mode, each CPU independently votes for an idle
> > > > state of the cluster.
> > > >
> > > > For this reason, introduce a per CPU variable called domain_state and
> > > > implement two helper functions to read/write its value. Then let the
> > > > domain_state take precedence over the regular selected state, when entering
> > > > and idle state.
> > > >
> > > > To avoid executing the above OSI specific code in the ->enter() callback,
> > > > while operating in the default PSCI Platform Coordinated mode, let's also
> > > > add a new enter-function and use it for OSI.
> > > >
> > > > Co-developed-by: Lina Iyer <lina.iyer@xxxxxxxxxx>
> > > > Signed-off-by: Lina Iyer <lina.iyer@xxxxxxxxxx>
> > > > Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> > > > ---
> > > >
> > > > Changes in v4:
> > > >       - Rebased on top of earlier changes.
> > > >       - Add comment about using the deepest cpuidle state for the domain state
> > > >       selection.
> > > >
> > > > ---
> > > >  drivers/cpuidle/cpuidle-psci.c | 56 ++++++++++++++++++++++++++++++----
> > > >  1 file changed, 50 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > > > index 6a87848be3c3..9600fe674a89 100644
> > > > --- a/drivers/cpuidle/cpuidle-psci.c
> > > > +++ b/drivers/cpuidle/cpuidle-psci.c
> > > > @@ -29,14 +29,47 @@ struct psci_cpuidle_data {
> > > >  };
> > > >
> > > >  static DEFINE_PER_CPU_READ_MOSTLY(struct psci_cpuidle_data, psci_cpuidle_data);
> > > > +static DEFINE_PER_CPU(u32, domain_state);
> > > > +
> > >
> > > [...]
> > >
> > > > +static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> > > > +                                     struct cpuidle_driver *drv, int idx)
> > > > +{
> > > > +     struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data);
> > > > +     u32 *states = data->psci_states;
> > >
> > > Why can't the above be like this for consistency(see below in
> > > psci_enter_idle_state) ?
> >
> > You have a point, however in patch11 I am adding this line below.
> >
> > struct device *pd_dev = data->dev;
> >
> > So I don't think it matters much, agree?
> >
>
> Ah OK, looked odd as part of this patch, may be you could have moved
> this change into that patch. Anyways fine as is.

Okay, then I rather just keep it.

>
> > >
> > >         u32 *states = __this_cpu_read(psci_cpuidle_data.psci_states);
> > >
> > > > +     u32 state = psci_get_domain_state();
> > > > +     int ret;
> > > > +
> > > > +     if (!state)
> > > > +             state = states[idx];
> > > > +
> > > > +     ret = psci_enter_state(idx, state);
> > > > +
> > > > +     /* Clear the domain state to start fresh when back from idle. */
> > > > +     psci_set_domain_state(0);
> > > > +     return ret;
> > > > +}
> > > >
> > >
> > > [...]
> > >
> > > > @@ -118,6 +152,15 @@ static int __init psci_dt_cpu_init_idle(struct device_node *cpu_node,
> > > >                       ret = PTR_ERR(data->dev);
> > > >                       goto free_mem;
> > > >               }
> > > > +
> > > > +             /*
> > > > +              * Using the deepest state for the CPU to trigger a potential
> > > > +              * selection of a shared state for the domain, assumes the
> > > > +              * domain states are all deeper states.
> > > > +              */
> > > > +             if (data->dev)
> > >
> > > You can drop this check as return on error above.
> >
> > Actually not, because if OSI is supported, there is still a
> > possibility that the PM domain topology isn't used.
> >
>
> And how do we support that ? I am missing something here.
>
> > This means ->data->dev is NULL.
> >
>
> I don't get that.

This is quite similar to the existing limited support we have for OSI today.

We are using the idle states for the CPU, but ignoring the idle states
for the cluster. If you just skip applying the DTS patch14, this is
what happens.

>
> > >
> > > > +                     drv->states[state_count - 1].enter =
> > > > +                             psci_enter_domain_idle_state;
> > >
> > > I see the comment above but this potential blocks retention mode at
> > > cluster level when all cpu enter retention at CPU level. I don't like
> > > this assumption, but I don't have any better suggestion. Please add the
> > > note that we can't enter RETENTION state at cluster/domain level when
> > > all CPUs enter at CPU level.
> >
> > You are correct, but I think the comment a few lines above (agreed to
> > be added by Lorenzo in the previous version) should be enough to
> > explain that. No?
> >
> > The point is, this is only a problem if cluster RETENTION is
> > considered to be a shallower state that CPU power off, for example.
> >
>
> Yes, but give examples makes it better and helps people who may be
> wondering why cluster retention state is not being entered. You can just
> add to the above comment:
>
> "e.g. If CPU Retention is one of the shallower state, then we can't enter
> any of the allowed domain states."

Hmm, that it's not a correct statement I think, let me elaborate.

The problem is, that in case the CPU has both RETENTION and POWER OFF
(deepest CPU state), we would only be able to reach a cluster state
(RETENTION or POWER OFF) when the CPUs are in CPU POWER OFF (as that's
the deepest).

This is okay, as long as a cluster RETENTION state is considered being
"deeper" than the CPU POWER OFF state. However, if that isn't the
case, it means  the cluster RETENTION state is not considered in the
correct order, but it's still possible to reach as a "domain state".

I think this all is kind of summarized in the comment I agreed upon
with Lorenzo, but if you still think there is some clarification
needed I happy to add it.

Makes sense?

[...]

Kind regards
Uffe