On Mon, Jul 31, 2023 at 10:46:09AM -0700, David Dai wrote: > Introduce a virtualized cpufreq driver for guest kernels to improve > performance and power of workloads within VMs. > > This driver does two main things: > > 1. Sends the frequency of vCPUs as a hint to the host. The host uses the > hint to schedule the vCPU threads and decide physical CPU frequency. > > 2. If a VM does not support a virtualized FIE(like AMUs), it queries the > host CPU frequency by reading a MMIO region of a virtual cpufreq device > to update the guest's frequency scaling factor periodically. This enables > accurate Per-Entity Load Tracking for tasks running in the guest. > > Co-developed-by: Saravana Kannan <saravanak@xxxxxxxxxx> > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx> > Signed-off-by: David Dai <davidai@xxxxxxxxxx> [...] > +static void virt_scale_freq_tick(void) > +{ > + struct cpufreq_policy *policy = cpufreq_cpu_get(smp_processor_id()); > + struct virt_cpufreq_drv_data *data = policy->driver_data; > + u32 max_freq = (u32)policy->cpuinfo.max_freq; > + u64 cur_freq; > + u64 scale; > + > + cpufreq_cpu_put(policy); > + > + cur_freq = (u64)data->ops->get_freq(policy); > + cur_freq <<= SCHED_CAPACITY_SHIFT; > + scale = div_u64(cur_freq, max_freq); > + > + this_cpu_write(arch_freq_scale, (unsigned long)scale); > +} > + We expect the host to provide the frequency in kHz, can you please add a comment about it. It is not very obvious when you look at the REG_CUR_FREQ_OFFSET register name. > +static struct scale_freq_data virt_sfd = { > + .source = SCALE_FREQ_SOURCE_VIRT, > + .set_freq_scale = virt_scale_freq_tick, > +}; > + > +static unsigned int virt_cpufreq_set_perf(struct cpufreq_policy *policy) > +{ > + struct virt_cpufreq_drv_data *data = policy->driver_data; > + /* > + * Use cached frequency to avoid rounding to freq table entries > + * and undo 25% frequency boost applied by schedutil. > + */ > + u32 freq = mult_frac(policy->cached_target_freq, 80, 100); > + > + data->ops->set_freq(policy, freq); > + return 0; > +} Why do we undo the frequency boost? A governor may apply other boosts like RT (uclamp), iowait. It is not clear why we need to worry about governor policies here. > + > +static unsigned int virt_cpufreq_fast_switch(struct cpufreq_policy *policy, > + unsigned int target_freq) > +{ > + virt_cpufreq_set_perf(policy); > + return target_freq; > +} > + > +static int virt_cpufreq_target_index(struct cpufreq_policy *policy, > + unsigned int index) > +{ > + return virt_cpufreq_set_perf(policy); > +} > + > +static int virt_cpufreq_cpu_init(struct cpufreq_policy *policy) > +{ > + struct virt_cpufreq_drv_data *drv_data = cpufreq_get_driver_data(); > + struct cpufreq_frequency_table *table; > + struct device *cpu_dev; > + int ret; > + > + cpu_dev = get_cpu_device(policy->cpu); > + if (!cpu_dev) > + return -ENODEV; > + > + ret = dev_pm_opp_of_add_table(cpu_dev); > + if (ret) > + return ret; > + > + ret = dev_pm_opp_get_opp_count(cpu_dev); > + if (ret <= 0) { > + dev_err(cpu_dev, "OPP table can't be empty\n"); > + return -ENODEV; > + } > + > + ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &table); > + if (ret) { > + dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret); > + return ret; > + } > + > + policy->freq_table = table; > + policy->dvfs_possible_from_any_cpu = false; > + policy->fast_switch_possible = true; > + policy->driver_data = drv_data; > + > + /* > + * Only takes effect if another FIE source such as AMUs > + * have not been registered. > + */ > + topology_set_scale_freq_source(&virt_sfd, policy->cpus); > + > + return 0; > + > +} > + Do we need to register as FIE source even with the below commit? By registering as a source, we are not supplying any accurate metric. We still fallback on the same source that cpufreq implements it. 874f63531064 ("cpufreq: report whether cpufreq supports Frequency Invariance (FI)") Thanks, Pavan