Hi Lukasz, On Friday 24 Jan 2020 at 15:17:48 (+0000), Lukasz Luba wrote: [..] > > > static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap) > > > { > > > + u64 core_cnt, const_cnt; > > > + > > > if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > > > pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n", > > > smp_processor_id()); > > > - this_cpu_write(amu_feat, 1); > > > + core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0); > > > + const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0); > > > + > > > + this_cpu_write(arch_core_cycles_prev, core_cnt); > > > + this_cpu_write(arch_const_cycles_prev, const_cnt); > > > + > > > + this_cpu_write(amu_scale_freq, 1); > > > + } else { > > > + this_cpu_write(amu_scale_freq, 2); > > > } > > > } > > > > > > Yes, functionally this can be done here (it would need some extra checks > > on the initial values of core_cnt and const_cnt), but what I was saying > > in my previous comment is that I don't want to mix generic feature > > detection, which should happen here, with counter validation for > > frequency invariance. As you see, this would already bring here per-cpu > > variables for counters and amu_scale_freq flag, and I only see this > > getting more messy with the future use of more counters. I don't believe > > this code belongs here. > > > > Looking a bit more over the code and checking against the new frequency > > invariance code for x86, there is a case of either doing this CPU > > validation in smp_prepare_cpus (separately for arm64 and x86) or calling > > an arch_init_freq_invariance() maybe in sched_init_smp to be defined with > > the proper frequency invariance counter initialisation code separately > > for x86 and arm64. I'll have to look more over the details to make sure > > this is feasible. > > I have found that we could simply draw on from Mark's solution to > similar problem. In commit: > > commit df857416a13734ed9356f6e4f0152d55e4fb748a > Author: Mark Rutland <mark.rutland@xxxxxxx> > Date: Wed Jul 16 16:32:44 2014 +0100 > > arm64: cpuinfo: record cpu system register values > > Several kernel subsystems need to know details about CPU system register > values, sometimes for CPUs other than that they are executing on. Rather > than hard-coding system register accesses and cross-calls for these > cases, this patch adds logic to record various system register values at > boot-time. This may be used for feature reporting, firmware bug > detection, etc. > > Separate hooks are added for the boot and hotplug paths to enable > one-time intialisation and cold/warm boot value mismatch detection in > later patches. > > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> > Reviewed-by: Will Deacon <will.deacon@xxxxxxx> > Reviewed-by: Catalin Marinas <catalin.marinas@xxxxxxx> > Signed-off-by: Catalin Marinas <catalin.marinas@xxxxxxx> > > > He added cpuinfo_store_cpu() call in secondary_start_kernel() > [in arm64 smp.c]. Please check the file: > arch/arm64/kernel/cpuinfo.c > > We can probably add our read-amu-regs-and-setup-invariance call > just below his cpuinfo_store_cpu. > > Then the arm64 cpufeature.c would be clean, we will be called for > each cpu, late_initcal() will finish setup with edge case policy > check like in the init_amu_feature() code below. > Yes, this should work: calling a AMU per_cpu validation function in setup_processor for the boot CPU and in secondary_start_kernel for secondary and hotplugged CPUs. I would still like to bring this closer to the scheduler (sched_init_smp) as frequency invariance is a functionality needed by the scheduler and its initialisation should be part of scheduler init code. But this together with needed interfaces for other architectures can be done in a separate patchset that is not so AMU/arm64 specific. [..] > > > > Yes, with the design I mentioned above, this CPU policy validation could > > move to a late_initcall and I could drop the workqueues and the extra > > data structure. Thanks for this! > > > > Let me know what you think! > > > > One think is still open, the file drivers/base/arch_topology.c and > #ifdef in function arch_set_freq_scale(). > > Generally, if there is such need, it's better to put such stuff into the > header and make dual implementation not polluting generic code with: > #if defined(CONFIG_ARM64_XZY) > #endif > #if defined(CONFIG_POWERPC_ABC) > #endif > #if defined(CONFIG_x86_QAZ) > #endif > ... > > > In our case we would need i.e. linux/topology.h because it includes > asm/topology.h, which might provide a needed symbol. At the end of > linux/topology.h we can have: > > #ifndef arch_cpu_auto_scaling > static __always_inline > bool arch_cpu_auto_scaling(void) { return False; } > #endif > > Then, when the symbol was missing and we got the default one, > it should be easily optimized by the compiler. > > We could have a much cleaner function arch_set_freq_scale() > in drivers/base/ and all architecture will deal with specific > #ifdef CONFIG in their <asm/topology.h> implementations or > use default. > > Example: > arch_set_freq_scale() > { > unsigned long scale; > int i; > > if (arch_cpu_auto_scaling(cpu)) > return; > > scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq; > for_each_cpu(i, cpus) > per_cpu(freq_scale, i) = scale; > } > > Regards, > Lukasz > Okay, it does look nice and clean. Let me give this a try in v3. Thank you very much, Ionela.