Re: [PATCH v3 2/2] cpufreq: add virtual-cpufreq driver

Saravana Kannan <saravanak@xxxxxxxxxx> · Fri, 4 Aug 2023 15:23:04 -0700

On Tue, Aug 1, 2023 at 2:45 AM Quentin Perret <qperret@xxxxxxxxxx> wrote:
>
> Hi David,
>
> On Monday 31 Jul 2023 at 10:46:09 (-0700), David Dai wrote:
> > +static unsigned int virt_cpufreq_set_perf(struct cpufreq_policy *policy)
> > +{
> > +     struct virt_cpufreq_drv_data *data = policy->driver_data;
> > +     /*
> > +      * Use cached frequency to avoid rounding to freq table entries
> > +      * and undo 25% frequency boost applied by schedutil.
> > +      */
>
> The VMM would be a better place for this scaling I think, the driver
> can't/shouldn't make assumptions about the governor it is running with
> given that this is a guest userspace decision essentially.
>
> IIRC the fast_switch() path is only used by schedutil, so one could
> probably make a case to scale things there, but it'd be inconsistent
> with the "slow" switch case, and would create a fragile dependency, so
> it's probably not worth pursuing.

Thanks for the input Quentin!

David and I spend several hours over several days discussing this. We
were trying to think through and decide if we were really removing the
25% margin applied by the guest side schedutil or the host side
schedutil. We ran through different thought experiments on what would
happen if the guest used ondemand/conservative/performance/powersave
governors and what if in the future we had a configurable schedutil
margin.

We changed our opinions multiple times until we finally remembered
this goal from my original presentation[1]:

"On an idle host, running the use case in the host vs VM, should
result in close to identical DVFS behavior of the physical CPUs and
CPU selection for the threads."

For that statement to be true when the guest uses
ondemand/conservative governor, we have to remove the 25% margin
applied by the host side schedutil governor. Otherwise, running the
workload on the VM will result in frequencies 25% higher than running
the same load on the host with ondemand/conservative governor.

So, we finally concluded that we are really undoing the host side
schedutil margin. And in that case, it makes sense to undo this in the
VMM side. So, we'll go with your suggestion in this email instead of
making the schedutil margin to be 0 for the guest.

[1] - https://lpc.events/event/16/contributions/1195/attachments/970/1893/LPC%202022%20-%20VM%20DVFS.pdf

Thanks,
Saravana

>
> > +     u32 freq = mult_frac(policy->cached_target_freq, 80, 100);
> > +
> > +     data->ops->set_freq(policy, freq);
> > +     return 0;
> > +}
>
> Thanks,
> Quentin