Re: [PATCH v3 1/2] ACPI: CPPC: Disable FIE if registers in PCC regions

Viresh Kumar <viresh.kumar@xxxxxxxxxx> · Tue, 30 Aug 2022 08:57:39 +0530



+Vincent.

On 24-08-22, 15:41, Lukasz Luba wrote:
> Hi Jeremy,
> 
> +CC Dietmar, Morten and Souvik
> 
> On 8/18/22 22:16, Jeremy Linton wrote:
> > PCC regions utilize a mailbox to set/retrieve register values used by
> > the CPPC code. This is fine as long as the operations are
> > infrequent. With the FIE code enabled though the overhead can range
> > from 2-11% of system CPU overhead (ex: as measured by top) on Arm
> > based machines.
> > 
> > So, before enabling FIE assure none of the registers used by
> > cppc_get_perf_ctrs() are in the PCC region. Furthermore lets also
> > enable a module parameter which can also disable it at boot or module
> > reload.
> > 
> > Signed-off-by: Jeremy Linton <jeremy.linton@xxxxxxx>
> > ---
> >   drivers/acpi/cppc_acpi.c       | 41 ++++++++++++++++++++++++++++++++++
> >   drivers/cpufreq/cppc_cpufreq.c | 31 +++++++++++++++++++++----
> >   include/acpi/cppc_acpi.h       |  5 +++++
> >   3 files changed, 73 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > index 1e15a9f25ae9..c840bf606b30 100644
> > --- a/drivers/acpi/cppc_acpi.c
> > +++ b/drivers/acpi/cppc_acpi.c
> > @@ -1240,6 +1240,47 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
> >   }
> >   EXPORT_SYMBOL_GPL(cppc_get_perf_caps);
> > +/**
> > + * cppc_perf_ctrs_in_pcc - Check if any perf counters are in a PCC region.
> > + *
> > + * CPPC has flexibility about how counters describing CPU perf are delivered.
> > + * One of the choices is PCC regions, which can have a high access latency. This
> > + * routine allows callers of cppc_get_perf_ctrs() to know this ahead of time.
> > + *
> > + * Return: true if any of the counters are in PCC regions, false otherwise
> > + */
> > +bool cppc_perf_ctrs_in_pcc(void)
> > +{
> > +	int cpu;
> > +
> > +	for_each_present_cpu(cpu) {
> > +		struct cpc_register_resource *ref_perf_reg;
> > +		struct cpc_desc *cpc_desc;
> > +
> > +		cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> > +
> > +		if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
> > +		    CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
> > +		    CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
> > +			return true;
> > +
> > +
> > +		ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
> > +
> > +		/*
> > +		 * If reference perf register is not supported then we should
> > +		 * use the nominal perf value
> > +		 */
> > +		if (!CPC_SUPPORTED(ref_perf_reg))
> > +			ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
> > +
> > +		if (CPC_IN_PCC(ref_perf_reg))
> > +			return true;
> > +	}
> 
> Do we have a platform which returns false here?
> 
> > +	return false;
> > +}
> > +EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);
> > +
> >   /**
> >    * cppc_get_perf_ctrs - Read a CPU's performance feedback counters.
> >    * @cpunum: CPU from which to read counters.
> > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> > index 24eaf0ec344d..32fcb0bf74a4 100644
> > --- a/drivers/cpufreq/cppc_cpufreq.c
> > +++ b/drivers/cpufreq/cppc_cpufreq.c
> > @@ -63,7 +63,15 @@ static struct cppc_workaround_oem_info wa_info[] = {
> >   static struct cpufreq_driver cppc_cpufreq_driver;
> > +static enum {
> > +	FIE_UNSET = -1,
> > +	FIE_ENABLED,
> > +	FIE_DISABLED
> > +} fie_disabled = FIE_UNSET;
> > +
> >   #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
> > +module_param(fie_disabled, int, 0444);
> > +MODULE_PARM_DESC(fie_disabled, "Disable Frequency Invariance Engine (FIE)");
> 
> Why we need the modules support?
> I would drop this, since the fie_disabled would be set properly when
> needed. The code would be cleaner (more below).
> 
> >   /* Frequency invariance support */
> >   struct cppc_freq_invariance {
> > @@ -158,7 +166,7 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
> >   	struct cppc_freq_invariance *cppc_fi;
> >   	int cpu, ret;
> > -	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > +	if (fie_disabled)
> >   		return;
> >   	for_each_cpu(cpu, policy->cpus) {
> > @@ -199,7 +207,7 @@ static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> >   	struct cppc_freq_invariance *cppc_fi;
> >   	int cpu;
> > -	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > +	if (fie_disabled)
> >   		return;
> >   	/* policy->cpus will be empty here, use related_cpus instead */
> > @@ -229,7 +237,21 @@ static void __init cppc_freq_invariance_init(void)
> >   	};
> >   	int ret;
> > -	if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > +	switch (fie_disabled) {
> > +	/* honor user request */
> > +	case FIE_DISABLED:
> > +	case FIE_ENABLED:
> 
> This module's over-write doesn't look 'clean'.
> Is it OK to allow a user to go with the poor performing
> system (likely on many platforms)? Or we assume that there are
> platforms which has a bit faster mailboxes and they already
> have the FIE issue impacting task's utilization measurements.
> 
> It looks like we are not sure about the solution. On one hand
> we implement those checks in the cppc_perf_ctrs_in_pcc()
> which could set the flag, but on the other hand we allow user
> to decide. IMO this creates diversity that we are not able to control.
> It creates another tunable knob in the kernel, which then is forgotten
> to check.
> 
> I still haven't seen information that the old FIE was an issue on those
> servers and had impact on task utilization measurements. This should be
> a main requirement for this new feature. This would be after we proved
> that the utilization problem was due to the FIE and not something else (like
> uArch variation or workload variation).
> 
> IMO let's revert the ACPI_CPPC_CPUFREQ_FIE. When we get data that
> FIE is an issue on those servers we can come back to this topic.
> 
> Regards,
> Lukasz

-- 
viresh