On Tue, 10 Aug 2021 at 01:56, Dmitry Osipenko <digetx@xxxxxxxxx> wrote: > > 09.08.2021 17:15, Ulf Hansson пишет: > >> We did that in a previous versions of this series where drivers were > >> calling devm_tegra_core_dev_init_opp_table() helper during the probe to > >> initialize performance state of the domain. Moving OPP state > >> initialization into central place made drivers cleaner by removing the > >> boilerplate code. > > I am not against doing this in a central place, like $subject patch > > suggests. As a matter of fact, it makes perfect sense to me. > > > > However, what I am concerned about, is that you require to use genpd > > internal data structures to do it. I think we should try to avoid > > that. > > Alright, what do you think about this: > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c > index a934c679e6ce..5faed62075e9 100644 > --- a/drivers/base/power/domain.c > +++ b/drivers/base/power/domain.c > @@ -2669,12 +2669,37 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev, > dev->pm_domain->detach = genpd_dev_pm_detach; > dev->pm_domain->sync = genpd_dev_pm_sync; > > + if (pd->default_performance_state) { > + unsigned int default_pstate; > + > + ret = pd->default_performance_state(pd, dev); > + if (ret < 0) { > + dev_err(dev, "failed to get default performance state for PM domain %s: %d\n", > + pd->name, ret); > + goto out; > + } Adding a new callback seems reasonable to support this. > + > + default_pstate = ret; > + > + if (power_on) { > + ret = dev_pm_genpd_set_performance_state(dev, default_pstate); However, this is more questionable to me. First, I don't think we should care about whether this is "power_on" or not. At this point, performance states are treated orthogonal to idle states in genpd. We may decide to change that in some way, but that deserves a different change. Second, I don't think we should call dev_pm_genpd_set_performance_state() from here. It's probably better handled from the genpd callback itself, if/when needed. That said, perhaps the new callback should just return a regular error code and zero on success, rather than the current performance state. See more below. > + if (ret) { > + dev_err(dev, "failed to set default performance state %u for PM domain %s: %d\n", > + default_pstate, pd->name, ret); > + goto out; > + } > + } else { > + dev_gpd_data(dev)->rpm_pstate = default_pstate; No, this isn't the right thing to do. It looks like you are trying to use the ->rpm_pstate for synchronization with runtime PM for consumer drivers. This is fragile as it depends on the runtime PM deployment in the consumer driver. I think you should look at ->rpm_pstate as a variable solely for managing save/restore of the performance state for the device, during runtime suspend/resume in genpd. Synchronization of a vote for a performance state for a device, needs to be managed by calling dev_pm_genpd_set_performance_state() - or by calling an OPP function that calls it, like dev_pm_opp_set_rate(), for example. > + } > + } > + > if (power_on) { > genpd_lock(pd); > ret = genpd_power_on(pd, 0); > genpd_unlock(pd); > } > > +out: > if (ret) > genpd_remove_device(pd, dev); > > diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c > index 81d1f019fa0c..9efb55f52462 100644 > --- a/drivers/soc/tegra/pmc.c > +++ b/drivers/soc/tegra/pmc.c > @@ -518,15 +518,14 @@ static const char * const tegra_emc_compats[] = { > * We retrieve clock rate of the attached device and initialize domain's > * performance state in accordance to the clock rate. > */ > -static int tegra_pmc_pd_attach_dev(struct generic_pm_domain *genpd, > - struct device *dev) > +static int tegra_pmc_genpd_default_perf_state(struct generic_pm_domain *genpd, > + struct device *dev) > { > - struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev); > struct opp_table *opp_table, *pd_opp_table; > struct generic_pm_domain *core_genpd; > struct dev_pm_opp *opp, *pd_opp; > - unsigned long rate, state; > struct gpd_link *link; > + unsigned long rate; > struct clk *clk; > u32 hw_version; > int ret; > @@ -633,8 +632,7 @@ static int tegra_pmc_pd_attach_dev(struct generic_pm_domain *genpd, > * RPM-resume of the device. This means that drivers don't need to > * explicitly initialize performance state. > */ > - state = pm_genpd_opp_to_performance_state(&core_genpd->dev, pd_opp); > - gpd_data->rpm_pstate = state; > + ret = pm_genpd_opp_to_performance_state(&core_genpd->dev, pd_opp); I don't see how this avoids tegra_pmc_genpd_default_perf_state() from having to walk &genpd->child_links. That's still an issue, right? > dev_pm_opp_put(pd_opp); > > put_pd_opp_table: > @@ -1383,7 +1381,7 @@ static int tegra_powergate_add(struct tegra_pmc *pmc, struct device_node *np) > > pg->id = id; > pg->genpd.name = np->name; > - pg->genpd.attach_dev = tegra_pmc_pd_attach_dev; > + pg->genpd.default_performance_state = tegra_pmc_genpd_default_perf_state; > pg->genpd.power_off = tegra_genpd_power_off; > pg->genpd.power_on = tegra_genpd_power_on; > pg->pmc = pmc; > @@ -1500,7 +1498,7 @@ static int tegra_pmc_core_pd_add(struct tegra_pmc *pmc, struct device_node *np) > return -ENOMEM; > > genpd->name = np->name; > - genpd->attach_dev = tegra_pmc_pd_attach_dev; > + genpd->default_performance_state = tegra_pmc_genpd_default_perf_state; > genpd->set_performance_state = tegra_pmc_core_pd_set_performance_state; > genpd->opp_to_performance_state = tegra_pmc_core_pd_opp_to_performance_state; > > diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h > index 21a0577305ef..cd4867817ca5 100644 > --- a/include/linux/pm_domain.h > +++ b/include/linux/pm_domain.h > @@ -143,6 +143,8 @@ struct generic_pm_domain { > struct device *dev); > void (*detach_dev)(struct generic_pm_domain *domain, > struct device *dev); > + int (*default_performance_state)(struct generic_pm_domain *domain, > + struct device *dev); > unsigned int flags; /* Bit field of configs for genpd */ > struct genpd_power_state *states; > void (*free_states)(struct genpd_power_state *states, > > >> I can revert back to the previous variant, although this variant works > >> well too. > > I looked at that code and in that path we end up calling > > dev_pm_opp_set_rate(), after it has initialized the opp table for the > > device. > > > > Rather than doing the OF parsing above to find out the current state > > for the device, why can't you just call dev_pm_opp_set_rate() to > > initialize a proper vote instead? > > > > For some devices clock rate is either preset by bootloader, or by clk driver, or by assigned-clocks in a device-tree. And then I don't see what's the difference in comparison to initialization for the current rate. > > For some devices, like memory controller, we can't just change the clock rate because it's a complex procedure and some boards will use fixed rate, but the power vote still must be initialized. I am not saying you should change the clock rate. The current code path that runs via devm_tegra_core_dev_init_opp_table() just calls clk_get_rate and then dev_pm_opp_set_rate() with the current rate to vote for the corresponding OPP level. Right? Isn't this exactly what you want? No? Kind regards Uffe