On 13-06-19, 15:24, Viresh Kumar wrote: > I am confused as hell on what we should be doing and what we are doing > right now. And if we should do better. > > Let me explain with an example. > > - The clock provider supports following frequencies: 500, 600, 700, > 800, 900, 1000 MHz. > > - The OPP table contains/supports only a subset: 500, 700, 1000 MHz. > > Now, the request to change the frequency starts from cpufreq > governors, like schedutil when they calls: > > __cpufreq_driver_target(policy, 599 MHz, CPUFREQ_RELATION_L); > > CPUFREQ_RELATION_L means: lowest frequency at or above target. And so > I would expect the frequency to get set to 600MHz (if we look at clock > driver) or 700MHz (if we look at OPP table). I think we should decide > this thing from the OPP table only as that's what the platform guys > want us to use. So, we should end up with 700 MHz. > > Then we land into dev_pm_opp_set_rate(), which does this (which is > code copied from earlier version of cpufreq-dt driver): > > - clk_round_rate(clk, 599 MHz). > > clk_round_rate() returns the highest frequency lower than target. So > it must return 500 MHz (I haven't tested this yet, all theoretical). > > - _find_freq_ceil(opp_table, 500 MHz). > > This works like CPUFREQ_RELATION_L, so we find lowest frequency >= > target freq. And so we should get: 500 MHz itself as OPP table has > it. > > - clk_set_rate(clk, 500 MHz). > > This must be doing round-rate again, but I think we will settle with > 500 MHz eventually. > > > Now the questionnaire: > > - Is this whole exercise correct ? No, I missed the call to cpufreq_frequency_table_target() in __cpufreq_driver_target() which finds the exact frequency from cpufreq table (which was created using opp table) and so we never screw up here. Sorry for confusing everyone on this :( > Now lets move to this patch, which makes it more confusing. > > The OPP tables for CPUs and GPUs should already be somewhat like fmax > tables for particular voltage values and that's why both cpufreq and > OPP core try to find a frequency higher than target so we choose the > most optimum one power-efficiency wise. > > For cases where the OPP table is only a subset of the clk-providers > table (almost always), if we let the clock provider to find the > nearest frequency (which is lower) we will run the CPU/GPU at a > not-so-optimal frequency. i.e. if 500, 600, 700 MHz all need voltage > to be 1.2 V, we should be running at 700 always, while we may end up > running at 500 MHz. This won't happen for CPUs because of the reason I explained earlier. cpufreq core does the rounding before calling dev_pm_opp_set_rate(). And no other devices use dev_pm_opp_set_rate() right now apart from CPUs, so we are not going to break anything. > This kind of behavior (introduced by this patch) is important for > other devices which want to run at the nearest frequency to target > one, but not for CPUs/GPUs. So, we need to tag these IO devices > separately, maybe from DT ? So we select the closest match instead of > most optimal one. Hmm, so this patch won't break anything and I am inclined to apply it again :) Does anyone see any other issues with it, which I might be missing ? -- viresh