Hi Chris, On Wed, Mar 25, 2020 at 08:10:56AM +0000, Chris Wilson wrote: > Measure and compare the energy consumed, as reported by the rapl MSR, > by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not > at least halve the energy consumption of RC0, as this more than likely > means we failed to enter RC0 correctly. > > If we can't measure the energy draw with the MSR, then it will report 0 > for both measurements. Since the measurement works on all gen6+, this seems > worth flagging as an error. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > Cc: Andi Shyti <andi.shyti@xxxxxxxxx> would be nice to have a revision history, given that I got quite some versions of this patch. > +static u64 energy_uJ(struct intel_rc6 *rc6) > +{ > + unsigned long long power; > + u32 units; > + > + if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power)) > + return 0; > + > + units = (power & 0x1f00) >> 8; > + > + if (rdmsrl_safe(MSR_PP1_ENERGY_STATUS, &power)) > + return 0; > + > + return (1000000 * power) >> units; /* convert to uJ */ > +} shall we put this in a library? > res[0] = rc6_residency(rc6); > + dt = ktime_get(); > + rc0_power = energy_uJ(rc6); > msleep(250); > + rc0_power = energy_uJ(rc6) - rc0_power; > + dt = ktime_sub(ktime_get(), dt); > res[1] = rc6_residency(rc6); > if ((res[1] - res[0]) >> 10) { > pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n", > @@ -63,13 +85,23 @@ int live_rc6_manual(void *arg) > goto out_unlock; > } > > + rc0_power = div64_u64(NSEC_PER_SEC * rc0_power, ktime_to_ns(dt)); > + if (!rc0_power) { is this likely to happen? > res[0] = rc6_residency(rc6); > + dt = ktime_get(); > + rc6_power = energy_uJ(rc6); > msleep(100); > + rc6_power = energy_uJ(rc6) - rc6_power; > + dt = ktime_sub(ktime_get(), dt); > res[1] = rc6_residency(rc6); > - > if (res[1] == res[0]) { > pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n", > intel_uncore_read_fw(gt->uncore, GEN6_RC_STATE), > @@ -78,6 +110,15 @@ int live_rc6_manual(void *arg) > err = -EINVAL; > } > > + rc6_power = div64_u64(NSEC_PER_SEC * rc6_power, ktime_to_ns(dt)); > + pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n", > + rc0_power, rc6_power); > + if (2 * rc6_power > rc0_power) { > + pr_err("GPU leaked energy while in RC6!\n"); > + err = -EINVAL; > + goto out_unlock; > + } nice, Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxx> Thanks, Andi _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx