On Tue, Jul 16, 2019 at 10:45:49AM +0200, Ulf Hansson wrote: [...] > > > +static void psci_pd_convert_states(struct cpuidle_state *idle_state, > > > + u32 *psci_state, struct genpd_power_state *state) > > > +{ > > > + u32 *state_data = state->data; > > > + u64 target_residency_us = state->residency_ns; > > > + u64 exit_latency_us = state->power_on_latency_ns + > > > + state->power_off_latency_ns; > > > + > > > + *psci_state = *state_data; > > > + do_div(target_residency_us, 1000); > > > + idle_state->target_residency = target_residency_us; > > > + do_div(exit_latency_us, 1000); > > > + idle_state->exit_latency = exit_latency_us; > > > + idle_state->enter = &psci_pd_enter_pc; > > > + idle_state->enter_s2idle = &psci_pd_enter_s2idle_pc; > > > + idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP; > > > > This is arbitrary and not necessarily true. > > The arbitrary thing you refer to here, is that the > CPUIDLE_FLAG_TIMER_STOP? Or are you referring to the complete function > psci_pd_convert_states()? I refer to CPUIDLE_FLAG_TIMER_STOP. I think that on platform coordinated system we should not bother about the hierarchical representation of the states (I understand I asked you to make it work but it has become too complex, I would rather focus on making the hierarchical representation work for all idle states combination in OSI mode). Plus side, another level of complexity removed. > > I think that this patch is useful to represent my reservations about the > > current approach. As a matter of fact, idle state entry will always be a > > CPUidle decision. > > > > You only need PM domain information to understand when all CPUs > > in a power domain are actually idle but that's all genPD can do > > in this respect. > > > > I think this patchset would be much simpler if both CPUidle and > > genPD governor would work on *one* set of idle states, globally > > indexed (and that would be true for PSCI suspend parameters too). > > > > To work with a unified set of idle states between CPUidle and genPD > > (tossing some ideas around): > > > > - We can implement a genPD CPUidle governor that in its select method > > takes into account genPD information (for instance by avoiding > > selection of idle states that require multiple cpus to be in idle > > to be effectively active) > > - We can use genPD to enable/disable CPUidle states through runtime > > PM information > > I don't understand how to make this work. > > The CPUidle governor works on per CPU basis. The genpd governor works > on per PM domain basis, which typically can be a group of CPUs (and > even other devices) via subdomains, for example. > > 1. > In case of Linux being in *charge* of what idle state to pick for a > group of CPUs, that decision is best done by the genpd governor as it > operates on "groups" of CPUs. This is used for PSCI OSI mode. Of > course, there are idle states also per CPU, which potentially could be > managed by the genpd governor as well, but at this point I decided to > re-use the cpuidle governor as it's already doing the job. > > 2. In case the decision of what idle state to enter for the group is > done by the FW, we can rely solely on the cpuidle governor and let it > select states per CPU basis. This is used for PSCI PC mode. > > > > > There may be other ways. My point is that current code, with two (or > > more if the hierarchy grows) sets of idle states across two subsystems > > (CPUidle and genPD) is not very well defined and honestly very hard to > > grasp and prone to errors. > > The complexity is there, I admit that. > > I guess some deeper insight about genpd+its governor+runtime PM are > needed when reviewing this, of course. As an option, you may also have > a look at my slides [1] from OSPM (Pisa) in May this year, which via > flow charts try to describes things in more detail. > > In our offlist meeting, we discussed that perhaps moving some of the > new PSCI code introduced in this series, into a cpuidle driver > instead, could make things more clear. For sure, I can explore that > option, but before I go there, I think we should agree on it publicly. I will do it but given that the generic idle infrastructure basically is there for PSCI and: drivers/soc/qcom/spm.c if we create a PSCI CPUidle driver we can write one for qcom-spm and remove the generic idle infrastructure, there would not be much point in keeping it in the kernel; at least on ARM64 not using PSCI for CPUidle is not even an option. > In principle what it means is to invent a special cpuidle driver for > PSCI, so we would need access to some of the PSCI internal functions, > for example. Yes. > One thing though, the initialization of the PSCI PM domain topology is > a separate step, managed via the new initcall, psci_dt_topology_init() > (introduced in patch 11). That part still seems to be belong to the > PSCI code, don't you think? Yes but at least we can call it from a sensible place (well, sensible, most likely from an initcall given how idle drivers are initialized). > > > + strncpy(idle_state->name, to_of_node(state->fwnode)->name, > > > + CPUIDLE_NAME_LEN - 1); > > > + strncpy(idle_state->desc, to_of_node(state->fwnode)->name, > > > + CPUIDLE_NAME_LEN - 1); > > > +} > > > + > > > +static bool psci_pd_is_provider(struct device_node *np) > > > +{ > > > + struct psci_pd_provider *pd_prov, *it; > > > + > > > + list_for_each_entry_safe(pd_prov, it, &psci_pd_providers, link) { > > > + if (pd_prov->node == np) > > > + return true; > > > + } > > > + > > > + return false; > > > +} > > > + > > > static int psci_pd_init(struct device_node *np) > > > { > > > struct generic_pm_domain *pd; > > > @@ -265,4 +316,71 @@ int psci_dt_init_pm_domains(struct device_node *np) > > > pr_err("failed to create CPU PM domains ret=%d\n", ret); > > > return ret; > > > } > > > + > > > +int psci_dt_pm_domains_parse_states(struct cpuidle_driver *drv, > > > + struct device_node *cpu_node, u32 *psci_states) > > > +{ > > > + struct genpd_power_state *pd_states; > > > + struct of_phandle_args args; > > > + int ret, pd_state_count, i, state_idx, psci_idx; > > > + u32 cpu_psci_state = psci_states[drv->state_count - 2]; > > > > This (-2) is very dodgy and I doubt it would work on hierarchies going > > above "cluster" level. > > > > As I say above, I think we should work towards a single array of > > idle states to be selected by a CPUidle governor using genPD > > runtime information to bias the results according to the number > > of CPUs in a genPD that entered/exit idle. > > > > To be more precise, all idles states should be "domain-idle-state" > > compatible, even the CPU ones, the distinction between what CPUidle > > and genPD manage is a bit stretched IMO in this patchset. > > > > We will have a chance to talk about this but I thought I would > > comment publically if anyone else is willing to chime in, this > > is not a PSCI problem at all, it is a CPUidle/genPD coexistence > > design problem which is much broader. > > To move this forward, I think we need to move from vague ideas to > clear and distinct plans. Whatever that means. :-) See above. > I understand you are concerned about the level of changes introduced > to the PSCI code. As I stated somewhere in earlier replies, I picked > that approach as I thought it was better to implement things in a PSCI > specific manner to start with, then we could move things around, when > we realize that it make sense. I am also concerned about how the idle states are managed in this patchset and I am pretty certain it will break when we move away from a simple hierarchy with one CPU state and one cluster state, we will comment on the specifics. Moving PSCI code into a CPUidle driver will cater for the rest. > Anyway, as a suggestion to address your concern, how about this: > > 1. Move some things out to a PSCI cpuidle driver. We need to decide > more exactly on what to move and find the right level for the > interfaces. I will do it and post patches asap. > 2. Don't attach the CPU to the PM domain topology in case the PSCI PC > mode is used. I think this makes it easier, at least as a first step, > to understand when runtime PM needs to be used/enabled. In the PSCI CPUidle driver we can have two distinct struct cpuidle_state->enter functions for PC and OSI, no overhead for PC, runtime PM for OSI, decoupling done. We can choose one or the other depending on whether: OSI iff: - OSI is available - hierarchical idle states are present in DT otherwise PC. That's what this patch does but we will do it in a unified file. > 3. Would it help if I volunteer to help you guys as a maintainer for > PSCI. At least for the part of the new code that becomes introduced? We will do as described above if that makes sense. Thanks, Lorenzo