Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance

Ionela Voinescu <ionela.voinescu@xxxxxxx> · Tue, 28 Jan 2020 17:36:13 +0000

Hi Lukasz,

On Friday 24 Jan 2020 at 15:17:48 (+0000), Lukasz Luba wrote:
[..]
> > >   static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
> > >   {
> > > +	u64 core_cnt, const_cnt;
> > > +
> > >   	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
> > >   		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
> > >   			smp_processor_id());
> > > -		this_cpu_write(amu_feat, 1);
> > > +		core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> > > +		const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> > > +
> > > +		this_cpu_write(arch_core_cycles_prev, core_cnt);
> > > +		this_cpu_write(arch_const_cycles_prev, const_cnt);
> > > +
> > > +		this_cpu_write(amu_scale_freq, 1);
> > > +	} else {
> > > +		this_cpu_write(amu_scale_freq, 2);
> > >   	}
> > >   }
> > 
> > 
> > Yes, functionally this can be done here (it would need some extra checks
> > on the initial values of core_cnt and const_cnt), but what I was saying
> > in my previous comment is that I don't want to mix generic feature
> > detection, which should happen here, with counter validation for
> > frequency invariance. As you see, this would already bring here per-cpu
> > variables for counters and amu_scale_freq flag, and I only see this
> > getting more messy with the future use of more counters. I don't believe
> > this code belongs here.
> > 
> > Looking a bit more over the code and checking against the new frequency
> > invariance code for x86, there is a case of either doing this CPU
> > validation in smp_prepare_cpus (separately for arm64 and x86) or calling
> > an arch_init_freq_invariance() maybe in sched_init_smp to be defined with
> > the proper frequency invariance counter initialisation code separately
> > for x86 and arm64. I'll have to look more over the details to make sure
> > this is feasible.
> 
> I have found that we could simply draw on from Mark's solution to
> similar problem. In commit:
> 
> commit df857416a13734ed9356f6e4f0152d55e4fb748a
> Author: Mark Rutland <mark.rutland@xxxxxxx>
> Date:   Wed Jul 16 16:32:44 2014 +0100
> 
>     arm64: cpuinfo: record cpu system register values
> 
>     Several kernel subsystems need to know details about CPU system register
>     values, sometimes for CPUs other than that they are executing on. Rather
>     than hard-coding system register accesses and cross-calls for these
>     cases, this patch adds logic to record various system register values at
>     boot-time. This may be used for feature reporting, firmware bug
>     detection, etc.
> 
>     Separate hooks are added for the boot and hotplug paths to enable
>     one-time intialisation and cold/warm boot value mismatch detection in
>     later patches.
> 
>     Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
>     Reviewed-by: Will Deacon <will.deacon@xxxxxxx>
>     Reviewed-by: Catalin Marinas <catalin.marinas@xxxxxxx>
>     Signed-off-by: Catalin Marinas <catalin.marinas@xxxxxxx>
> 
> 
> He added cpuinfo_store_cpu() call in secondary_start_kernel()
> [in arm64 smp.c]. Please check the file:
> arch/arm64/kernel/cpuinfo.c
> 
> We can probably add our read-amu-regs-and-setup-invariance call
> just below his cpuinfo_store_cpu.
> 
> Then the arm64 cpufeature.c would be clean, we will be called for
> each cpu, late_initcal() will finish setup with edge case policy
> check like in the init_amu_feature() code below.
> 

Yes, this should work: calling a AMU per_cpu validation function in
setup_processor for the boot CPU and in secondary_start_kernel for
secondary and hotplugged CPUs.

I would still like to bring this closer to the scheduler
(sched_init_smp) as frequency invariance is a functionality needed by
the scheduler and its initialisation should be part of scheduler init
code. But this together with needed interfaces for other architectures
can be done in a separate patchset that is not so AMU/arm64 specific.

[..]
> > 
> > Yes, with the design I mentioned above, this CPU policy validation could
> > move to a late_initcall and I could drop the workqueues and the extra
> > data structure. Thanks for this!
> > 
> > Let me know what you think!
> > 
> 
> One think is still open, the file drivers/base/arch_topology.c and
> #ifdef in function arch_set_freq_scale().
> 
> Generally, if there is such need, it's better to put such stuff into the
> header and make dual implementation not polluting generic code with:
> #if defined(CONFIG_ARM64_XZY)
> #endif
> #if defined(CONFIG_POWERPC_ABC)
> #endif
> #if defined(CONFIG_x86_QAZ)
> #endif
> ...
> 
> 
> In our case we would need i.e. linux/topology.h because it includes
> asm/topology.h, which might provide a needed symbol. At the end of
> linux/topology.h we can have:
> 
> #ifndef arch_cpu_auto_scaling
> static __always_inline
> bool arch_cpu_auto_scaling(void) { return False; }
> #endif
> 
> Then, when the symbol was missing and we got the default one,
> it should be easily optimized by the compiler.
> 
> We could have a much cleaner function arch_set_freq_scale()
> in drivers/base/ and all architecture will deal with specific
> #ifdef CONFIG in their <asm/topology.h> implementations or
> use default.
> 
> Example:
> arch_set_freq_scale()
> {
> 	unsigned long scale;
> 	int i;
> 	
> 	if (arch_cpu_auto_scaling(cpu))
> 		return;
> 
> 	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
> 	for_each_cpu(i, cpus)
> 		per_cpu(freq_scale, i) = scale;
> }
> 
> Regards,
> Lukasz
>

Okay, it does look nice and clean. Let me give this a try in v3.

Thank you very much,
Ionela.