Re: [PATCH v8 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sumit,

On 2024/12/16 3:11 PM, Sumit Gupta wrote:


On 16/12/24 11:13, Kai-Heng Feng wrote:
Hi Beata,

On 2024/12/6 9:55 PM, Beata Michalska wrote:
Currently the CPUFreq core exposes two sysfs attributes that can be used
to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
and scaling_cur_freq. Both provide slightly different view on the
subject and they do come with their own drawbacks.

cpuinfo_cur_freq provides higher precision though at a cost of being
rather expensive. Moreover, the information retrieved via this attribute
is somewhat short lived as frequency can change at any point of time
making it difficult to reason from.

scaling_cur_freq, on the other hand, tends to be less accurate but then
the actual level of precision (and source of information) varies between
architectures making it a bit ambiguous.

The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
distinct interface, exposing an average frequency of a given CPU(s), as
reported by the hardware, over a time frame spanning no more than a few
milliseconds. As it requires appropriate hardware support, this
interface is optional.

Note that under the hood, the new attribute relies on the information
provided by arch_freq_get_on_cpu, which, up to this point, has been
feeding data for scaling_cur_freq attribute, being the source of
ambiguity when it comes to interpretation. This has been amended by
restoring the intended behavior for scaling_cur_freq, with a new
dedicated config option to maintain status quo for those, who may need
it.

CC: Jonathan Corbet <corbet@xxxxxxx>
CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxxxxx>
CC: Borislav Petkov <bp@xxxxxxxxx>
CC: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
CC: H. Peter Anvin <hpa@xxxxxxxxx>
CC: Phil Auld <pauld@xxxxxxxxxx>
CC: x86@xxxxxxxxxx
CC: linux-doc@xxxxxxxxxxxxxxx
Signed-off-by: Beata Michalska <beata.michalska@xxxxxxx>
---
  Documentation/admin-guide/pm/cpufreq.rst | 16 ++++++++++-
  arch/x86/kernel/cpu/aperfmperf.c         |  2 +-
  arch/x86/kernel/cpu/proc.c               |  7 +++--
  drivers/cpufreq/Kconfig.x86              | 12 ++++++++
  drivers/cpufreq/cpufreq.c                | 36 +++++++++++++++++++++---
  include/linux/cpufreq.h                  |  2 +-
  6 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin- guide/pm/cpufreq.rst
index fe1be4ad88cb..76f3835afe01 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -248,6 +248,19 @@ are the following:
      If that frequency cannot be determined, this attribute should not
      be present.
+``cpuinfo_avg_freq``
+        An average frequency (in KHz) of all CPUs belonging to a given policy,
+        derived from a hardware provided feedback and reported on a time frame
+        spanning at most few milliseconds.
+
+        This is expected to be based on the frequency the hardware actually runs +        at and, as such, might require specialised hardware support (such as AMU
+        extension on ARM). If one cannot be determined, this attribute should
+        not be present.
+
+        Note, that failed attempt to retrieve current frequency for a given
+        CPU(s) will result in an appropriate error.
+
  ``cpuinfo_max_freq``
      Maximum possible operating frequency the CPUs belonging to this policy
      can run at (in kHz).
@@ -293,7 +306,8 @@ are the following:
      Some architectures (e.g. ``x86``) may attempt to provide information
      more precisely reflecting the current CPU frequency through this
      attribute, but that still may not be the exact current CPU frequency as
-    seen by the hardware at the moment.
+    seen by the hardware at the moment. This behavior though, is only
+    available via c:macro:``CPUFREQ_ARCH_CUR_FREQ`` option.
  ``scaling_driver``
      The scaling driver currently in use.
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 0b69bfbf345d..a00059139ca4 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -413,7 +413,7 @@ void arch_scale_freq_tick(void)
   */
  #define MAX_SAMPLE_AGE    ((unsigned long)HZ / 50)
-unsigned int arch_freq_get_on_cpu(int cpu)
+int arch_freq_get_on_cpu(int cpu)
  {
      struct aperfmperf *s = per_cpu_ptr(&cpu_samples, cpu);
      unsigned int seq, freq;
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index e65fae63660e..34d8fb93fb70 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -86,9 +86,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
          seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
      if (cpu_has(c, X86_FEATURE_TSC)) {
-        unsigned int freq = arch_freq_get_on_cpu(cpu);
+        int freq = arch_freq_get_on_cpu(cpu);
-        seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
+        if (freq <= 0)
+            seq_puts(m, "cpu MHz\t\t: Unknown\n");
+        else
+            seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
      }
      /* Cache size */
diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index 97c2d4f15d76..212e1b9afe21 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -340,3 +340,15 @@ config X86_SPEEDSTEP_RELAXED_CAP_CHECK
        option lets the probing code bypass some of those checks if the
        parameter "relaxed_check=1" is passed to the module.
+config CPUFREQ_ARCH_CUR_FREQ
+    default y
+    bool "Current frequency derived from HW provided feedback"
+    help
+      This determines whether the scaling_cur_freq sysfs attribute returns
+      the last requested frequency or a more precise value based on hardware
+      provided feedback (as architected counters).
+      Given that a more precise frequency can now be provided via the
+      cpuinfo_avg_cur_freq attribute, by enabling this option,
+      scaling_cur_freq maintains the provision of a counter based frequency,
+      for compatibility reasons.
+
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 04fc786dd2c0..70df2a24437b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -747,9 +747,14 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
  show_one(scaling_min_freq, min);
  show_one(scaling_max_freq, max);
-__weak unsigned int arch_freq_get_on_cpu(int cpu)
+__weak int arch_freq_get_on_cpu(int cpu)
  {
-    return 0;
+    return -EOPNOTSUPP;
+}
+
+static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
+{
+    return arch_freq_get_on_cpu(policy->cpu) != -EOPNOTSUPP;
  }
  static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
@@ -757,8 +762,11 @@ static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
      ssize_t ret;
      unsigned int freq;
-    freq = arch_freq_get_on_cpu(policy->cpu);
-    if (freq)
+    freq = IS_ENABLED(CONFIG_CPUFREQ_ARCH_CUR_FREQ)
+        ? arch_freq_get_on_cpu(policy->cpu)
+        : 0;
+
+    if (freq > 0)
          ret = sysfs_emit(buf, "%u\n", freq);
      else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
          ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
@@ -802,6 +810,19 @@ static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
      return sysfs_emit(buf, "<unknown>\n");
  }
+/*
+ * show_cpuinfo_avg_freq - average CPU frequency as detected by hardware
+ */
+static ssize_t show_cpuinfo_avg_freq(struct cpufreq_policy *policy,
+                     char *buf)
+{
+    int avg_freq = arch_freq_get_on_cpu(policy->cpu);

We are seeing issues when reading cpuinfo_avg_freq on an ARM64 system:

$ cat /sys/devices/system/cpu/cpufreq/policy1/cpuinfo_avg_freq
cat: /sys/devices/system/cpu/cpufreq/policy1/cpuinfo_avg_freq: Resource temporarily unavailable

The CPU is in idle state, so arch_freq_get_on_cpu() can't find a good alternative source for frequency info.


Hi Kai Heng,
This has already been discussed during v7 in [1] & [2].

Thanks for the info!

In v7, we were returning zero which printed 'unknown'.
The discussion was about printing in more descriptive way or with an appropriate error code. In v8 we are returning 'EAGAIN' instead of zero. The final decision was of Maintainers.

Is there any cpufreq driver that prints "unknown" or error when CPU is in idle?

I think it's more unsurprising to print the lowest CPU frequency when CPU is in idle state, instead of any other error code.

Kai-Heng


Viresh,
You have any preference on this?

[1] https://lore.kernel.org/lkml/aa254516-968e-4665-bb5b-981c296ffc35@xxxxxxxxxx/#t
[2] https://lore.kernel.org/lkml/Zyh-uVSW-0d0r8oB@xxxxxxx/

Thank you,
Sumit Gupta

One way to resolve this is to have fallback methods in show_cpuinfo_avg_freq() so it will look like this:

static ssize_t show_cpuinfo_avg_freq(struct cpufreq_policy *policy,
                                      char *buf)
{
         int avg_freq = arch_freq_get_on_cpu(policy->cpu);
         int ret;

         if (avg_freq > 0)
                 ret = sysfs_emit(buf, "%u\n", avg_freq);
         else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
                 ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
         else
                 ret = sysfs_emit(buf, "%u\n", policy->cur);
         return ret;
}

But that also makes show_cpuinfo_avg_freq() pretty much the same as show_scaling_cur_freq().

So is it possible to consolidate show_cpuinfo_avg_freq() into show_scaling_cur_freq(), by making CONFIG_CPUFREQ_ARCH_CUR_FREQ also available to ARM64?

Kai-Heng

+
+    if (avg_freq > 0)
+        return sysfs_emit(buf, "%u\n", avg_freq);
+    return avg_freq != 0 ? avg_freq : -EINVAL;
+}
+
  /*
   * show_scaling_governor - show the current policy for the specified CPU
   */
@@ -964,6 +985,7 @@ static ssize_t show_bios_limit(struct cpufreq_policy *policy, char *buf)
  }
  cpufreq_freq_attr_ro_perm(cpuinfo_cur_freq, 0400);
+cpufreq_freq_attr_ro(cpuinfo_avg_freq);
  cpufreq_freq_attr_ro(cpuinfo_min_freq);
  cpufreq_freq_attr_ro(cpuinfo_max_freq);
  cpufreq_freq_attr_ro(cpuinfo_transition_latency);
@@ -1091,6 +1113,12 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy)
              return ret;
      }
+    if (cpufreq_avg_freq_supported(policy)) {
+        ret = sysfs_create_file(&policy->kobj, &cpuinfo_avg_freq.attr);
+        if (ret)
+            return ret;
+    }
+
      ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
      if (ret)
          return ret;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index d4d2f4d1d7cb..a7b6c0ccf9bc 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1194,7 +1194,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
  }
  #endif
-extern unsigned int arch_freq_get_on_cpu(int cpu);
+extern int arch_freq_get_on_cpu(int cpu);
  #ifndef arch_set_freq_scale
  static __always_inline






[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux