On Tue, 04 Oct 2022 06:00:22 -0700, Tvrtko Ursulin wrote: > Hi Tvrtko, > > On 04/10/2022 10:29, Tvrtko Ursulin wrote: > > > > On 03/10/2022 20:24, Ashutosh Dixit wrote: > >> PMU and sysfs use different wakeref's to "interpret" zero freq. Sysfs > >> uses > >> runtime PM wakeref (see intel_rps_read_punit_req and > >> intel_rps_read_actual_frequency). PMU uses the GT parked/unparked > >> wakeref. In general the GT wakeref is held for less time that the runtime > >> PM wakeref which causes PMU to report a lower average freq than the > >> average > >> freq obtained from sampling sysfs. > >> > >> To resolve this, use the same freq functions (and wakeref's) in PMU as > >> those used in sysfs. > >> > >> Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7025 > >> Reported-by: Ashwin Kumar Kulkarni <ashwin.kumar.kulkarni@xxxxxxxxx> > >> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > >> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@xxxxxxxxx> > >> --- > >> drivers/gpu/drm/i915/i915_pmu.c | 27 ++------------------------- > >> 1 file changed, 2 insertions(+), 25 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_pmu.c > >> b/drivers/gpu/drm/i915/i915_pmu.c > >> index 958b37123bf1..eda03f264792 100644 > >> --- a/drivers/gpu/drm/i915/i915_pmu.c > >> +++ b/drivers/gpu/drm/i915/i915_pmu.c > >> @@ -371,37 +371,16 @@ static void > >> frequency_sample(struct intel_gt *gt, unsigned int period_ns) > >> { > >> struct drm_i915_private *i915 = gt->i915; > >> - struct intel_uncore *uncore = gt->uncore; > >> struct i915_pmu *pmu = &i915->pmu; > >> struct intel_rps *rps = >->rps; > >> if (!frequency_sampling_enabled(pmu)) > >> return; > >> - /* Report 0/0 (actual/requested) frequency while parked. */ > >> - if (!intel_gt_pm_get_if_awake(gt)) > >> - return; > >> - > >> if (pmu->enable & config_mask(I915_PMU_ACTUAL_FREQUENCY)) { > >> - u32 val; > >> - > >> - /* > >> - * We take a quick peek here without using forcewake > >> - * so that we don't perturb the system under observation > >> - * (forcewake => !rc6 => increased power use). We expect > >> - * that if the read fails because it is outside of the > >> - * mmio power well, then it will return 0 -- in which > >> - * case we assume the system is running at the intended > >> - * frequency. Fortunately, the read should rarely fail! > >> - */ > >> - val = intel_uncore_read_fw(uncore, GEN6_RPSTAT1); > >> - if (val) > >> - val = intel_rps_get_cagf(rps, val); > >> - else > >> - val = rps->cur_freq; > >> - > >> add_sample_mult(&pmu->sample[__I915_SAMPLE_FREQ_ACT], > >> - intel_gpu_freq(rps, val), period_ns / 1000); > >> + intel_rps_read_actual_frequency(rps), > >> + period_ns / 1000); > >> } > >> if (pmu->enable & config_mask(I915_PMU_REQUESTED_FREQUENCY)) { > > > > What is software tracking of requested frequency showing when GT is > > parked or runtime suspended? With this change sampling would be outside > > any such checks so we need to be sure reported value makes sense. > > > > Although more important open is around what is actually correct. > > > > For instance how does the patch affect RC6 and power? I don't know how > > power management of different blocks is wired up, so personally I would > > only be able to look at it empirically. In other words what I am asking > > is this - if we changed from skipping obtaining forcewake even when > > unparked, to obtaining forcewake if not runtime suspended - what hardware > > blocks does that power up and how it affects RC6 and power? Can it affect > > actual frequency or not? (Will "something" power up the clocks just > > because we will be getting forcewake?) > > > > Or maybe question simplified - does 200Hz polling on existing sysfs > > actual frequency field disturbs the system under some circumstances? > > (Increases power and decreases RC6.) If it does then that would be a > > problem. We want a solution which shows the real data, but where the act > > of monitoring itself does not change it too much. If it doesn't then it's > > okay. > > > > Could you somehow investigate on these topics? Maybe log RAPL GPU power > > while polling on sysfs, versus getting the actual frequency from the > > existing PMU implementation and see if that shows anything? Or actually > > simpler - RAPL GPU power for current PMU intel_gpu_top versus this patch? > > On idle(-ish) desktop workloads perhaps? Power and frequency graphed for > > both. > > Another thought - considering that bspec says for 0xa01c "This register > reflects real-time values and thus does not have a pre-determined default > value out of reset" - could it be that it also does not reflect a real > value when GPU is not executing anything (so zero), just happens to be not > runtime suspended? That would mean sysfs reads could maybe show last known > value? Just a thought to check. Thanks for the suggestion, I'll try to check and report what I find. > I've also tried on my Alderlake desktop: > > 1) > > while true; do cat gt_act_freq_mhz >/dev/null; sleep 0.005; done > > This costs ~120mW of GPU power and ~20% decrease in RC6. > > > 2) > > intel_gpu_top -l -s 5 >/dev/null > > This costs no power or RC6. Thanks for the experiments. As I mentioned for Gen12+ is a different register which doesn't require taking a forcewake (it's not upstream yet but you can see it in this patch: https://patchwork.freedesktop.org/patch/504920/?series=109116&rev=1#comment_910146) so this issue should not be there at least for Gen12+. > I have also never observed sysfs to show below min freq. This was with no > desktop so it's possible this register indeed does not reflect the real > situation when things are idle. > > So I think it is possible sysfs value is the misleading one. Thanks I will check. The other possibility is if someone is holding a forcewake, the products where we are seeing this is have GuC controlling the both the frequency (SLPC) as well RC6 (GUCRC). Thanks. -- Ashutosh