Re: [PATCH v6] drm/i915/selftests: Implement frequency logging for energy reading validation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 13-11-2024 15:20, Sk Anirban wrote:
Introduce RC6 & RC0 frequency logging mechanism to ensure accurate
energy readings aimed at addressing GPU energy leaks and power
measurement failures.
This enhancement will help ensure the accuracy of energy readings.

v2:
   - Improved commit message.
v3:
   - Used pr_err log to display frequency (Anshuman)
   - Sorted headers alphabetically (Sai Teja)
v4:
   - Improved commit message.
   - Fix pr_err log (Sai Teja)
v5:
   - Add error & debug logging for RC0 power and frequency checks (Anshuman)
v6:
   - Modify debug logging for RC0 power and frequency checks (Sai Teja)

Signed-off-by: Sk Anirban <sk.anirban@xxxxxxxxx>
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@xxxxxxxxx>
---
  drivers/gpu/drm/i915/gt/selftest_rc6.c | 15 +++++++++++++--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
index 1aa1446c8fb0..a8776f88d6a1 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
@@ -8,6 +8,7 @@
  #include "intel_gpu_commands.h"
  #include "intel_gt_requests.h"
  #include "intel_ring.h"
+#include "intel_rps.h"
  #include "selftest_rc6.h"
#include "selftests/i915_random.h"
@@ -38,6 +39,9 @@ int live_rc6_manual(void *arg)
  	ktime_t dt;
  	u64 res[2];
  	int err = 0;
+	u32 rc0_freq = 0;
+	u32 rc6_freq = 0;
+	struct intel_rps *rps = &gt->rps;
/*
  	 * Our claim is that we can "encourage" the GPU to enter rc6 at will.
@@ -66,6 +70,7 @@ int live_rc6_manual(void *arg)
  	rc0_power = librapl_energy_uJ() - rc0_power;
  	dt = ktime_sub(ktime_get(), dt);
  	res[1] = rc6_residency(rc6);
+	rc0_freq = intel_rps_read_actual_frequency(rps);
  	if ((res[1] - res[0]) >> 10) {
  		pr_err("RC6 residency increased by %lldus while disabled for 1000ms!\n",
  		       (res[1] - res[0]) >> 10);
@@ -77,7 +82,11 @@ int live_rc6_manual(void *arg)
  		rc0_power = div64_u64(NSEC_PER_SEC * rc0_power,
  				      ktime_to_ns(dt));
  		if (!rc0_power) {
-			pr_err("No power measured while in RC0\n");
+			if (rc0_freq)
+				pr_err("No power measured while in RC0! GPU Freq: %u in RC0\n",
+				       rc0_freq);
+			else
+				pr_err("No power and freq measured while in RC0\n");
  			err = -EINVAL;
  			goto out_unlock;
  		}
@@ -91,6 +100,7 @@ int live_rc6_manual(void *arg)
  	dt = ktime_get();
  	rc6_power = librapl_energy_uJ();
  	msleep(100);
+	rc6_freq = intel_rps_read_actual_frequency(rps);

I think intention of reading frequency here is to know if device was not in RC6 when there is failure. But for the platforms below gen12 reading act frequency will cause gt wake as GEN6_RPSTAT reg requires forcewake. To avoid wake when device is in RC6 read actual frequency without applying forcewake.

Additionally add delay, may be delay of 1 seconds after re-enabling RC6 manually and forcewake flush.

Regards,
Badal

  	rc6_power = librapl_energy_uJ() - rc6_power;
  	dt = ktime_sub(ktime_get(), dt);
  	res[1] = rc6_residency(rc6);
@@ -108,7 +118,8 @@ int live_rc6_manual(void *arg)
  		pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
  			rc0_power, rc6_power);
  		if (2 * rc6_power > rc0_power) {
-			pr_err("GPU leaked energy while in RC6!\n");
+			pr_err("GPU leaked energy while in RC6! GPU Freq: %u in RC6 and %u in RC0\n",
+			       rc6_freq, rc0_freq);
  			err = -EINVAL;
  			goto out_unlock;
  		}




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux