Re: [PATCH v6] drm/i915/selftests: Implement frequency logging for energy reading validation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On 20-11-2024 20:20, Nilawar, Badal wrote:


On 20-11-2024 16:00, Gupta, Anshuman wrote:


-----Original Message-----
From: Nilawar, Badal <badal.nilawar@xxxxxxxxx>
Sent: Wednesday, November 20, 2024 1:44 PM
To: Anirban, Sk <sk.anirban@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Gupta, Anshuman <anshuman.gupta@xxxxxxxxx>; Poosa, Karthik
<karthik.poosa@xxxxxxxxx>; Pottumuttu, Sai Teja
<sai.teja.pottumuttu@xxxxxxxxx>
Subject: Re: [PATCH v6] drm/i915/selftests: Implement frequency logging for
energy reading validation



On 13-11-2024 15:20, Sk Anirban wrote:
Introduce RC6 & RC0 frequency logging mechanism to ensure accurate
energy readings aimed at addressing GPU energy leaks and power
measurement failures.
This enhancement will help ensure the accuracy of energy readings.

v2:
    - Improved commit message.
v3:
    - Used pr_err log to display frequency (Anshuman)
    - Sorted headers alphabetically (Sai Teja)
v4:
    - Improved commit message.
    - Fix pr_err log (Sai Teja)
v5:
    - Add error & debug logging for RC0 power and frequency checks
(Anshuman)
v6:
    - Modify debug logging for RC0 power and frequency checks (Sai
Teja)

Signed-off-by: Sk Anirban <sk.anirban@xxxxxxxxx>
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@xxxxxxxxx>
---
   drivers/gpu/drm/i915/gt/selftest_rc6.c | 15 +++++++++++++--
   1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c
b/drivers/gpu/drm/i915/gt/selftest_rc6.c
index 1aa1446c8fb0..a8776f88d6a1 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
@@ -8,6 +8,7 @@
   #include "intel_gpu_commands.h"
   #include "intel_gt_requests.h"
   #include "intel_ring.h"
+#include "intel_rps.h"
   #include "selftest_rc6.h"

   #include "selftests/i915_random.h"
@@ -38,6 +39,9 @@ int live_rc6_manual(void *arg)
       ktime_t dt;
       u64 res[2];
       int err = 0;
+    u32 rc0_freq = 0;
+    u32 rc6_freq = 0;
+    struct intel_rps *rps = &gt->rps;

       /*
        * Our claim is that we can "encourage" the GPU to enter rc6 at will.
@@ -66,6 +70,7 @@ int live_rc6_manual(void *arg)
       rc0_power = librapl_energy_uJ() - rc0_power;
       dt = ktime_sub(ktime_get(), dt);
       res[1] = rc6_residency(rc6);
+    rc0_freq = intel_rps_read_actual_frequency(rps);
       if ((res[1] - res[0]) >> 10) {
           pr_err("RC6 residency increased by %lldus while disabled for
1000ms!\n",
                  (res[1] - res[0]) >> 10);
@@ -77,7 +82,11 @@ int live_rc6_manual(void *arg)
           rc0_power = div64_u64(NSEC_PER_SEC * rc0_power,
                         ktime_to_ns(dt));
           if (!rc0_power) {
-            pr_err("No power measured while in RC0\n");
+            if (rc0_freq)
+                pr_err("No power measured while in RC0!
GPU Freq: %u in RC0\n",
+                       rc0_freq);
If rc0 frequency is there then, this has to be pr_dbg, otherwise what is the purpose of this patch.

I too didn't understand purpose of this. How this going to help for accurate energy readings.
in case of rc0 power is 0 , I just want to confirm whether the freq is available there or not ? Also it is defined as pr_err because there is no point of validate rc0 power below if no power is being measured.

+            else
+                pr_err("No power and freq measured while in
RC0\n");
               err = -EINVAL;
               goto out_unlock;
           }
@@ -91,6 +100,7 @@ int live_rc6_manual(void *arg)
       dt = ktime_get();
       rc6_power = librapl_energy_uJ();
       msleep(100);
+    rc6_freq = intel_rps_read_actual_frequency(rps);

I think intention of reading frequency here is to know if device was not in RC6 when there is failure. But for the platforms below gen12 reading act frequency
will cause gt wake as GEN6_RPSTAT reg requires forcewake.
To avoid wake when device is in RC6 read actual frequency without applying
forcewake.
If reading act_freq will wake the device, How to read frequency without forcewake then ?

Use this api intel_rps_read_actual_frequency_fw to read freq without forcewake.

Regards,
Badal


Thanks,
Anshuaman

Additionally add delay, may be delay of 1 seconds after re-enabling RC6
manually and forcewake flush.
yeah, I can use that to read the actual freq but there is already a check involved just to cross verify whether the system has entered RC6 or not.

Regards,
Badal
Thanks,
Anirban

       rc6_power = librapl_energy_uJ() - rc6_power;
       dt = ktime_sub(ktime_get(), dt);
       res[1] = rc6_residency(rc6);
@@ -108,7 +118,8 @@ int live_rc6_manual(void *arg)
           pr_info("GPU consumed %llduW in RC0 and %llduW in
RC6\n",
               rc0_power, rc6_power);
           if (2 * rc6_power > rc0_power) {
-            pr_err("GPU leaked energy while in RC6!\n");
+            pr_err("GPU leaked energy while in RC6! GPU Freq:
%u in RC6 and %u in RC0\n",
+                   rc6_freq, rc0_freq);
               err = -EINVAL;
               goto out_unlock;
           }






[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux