Re: [patch 0/3] KVM CPU frequency change hypercalls (resend)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 02, 2017 at 11:15:00AM +0100, Paolo Bonzini wrote:
> 
> 
> On 01/03/2017 16:04, Marcelo Tosatti wrote:
> > 
> > Paolo: please comment on your objections and what should
> > be done instead. Note the case "multiple vcpus 
> > on a given pcpu" is not part of the usecase in question.
> 
> I would like to understand the intended usecase of cpufreq-userspace.
> 
> My understanding is that you would have a daemon handling a systemwide
> policy; examples are the historical (and now obsolete) users such as
> cpufreqd, cpudyn, powernowd, or cpuspeed.
> 
> The user alternatively can play the role of the daemon by writing to
> sysfs, but I've never seen userspace tasks talking to cpufreq-userspace
> to set their own running frequency.
>
> Apparently DPDK does that, and I would like to know the opinion of the
> linux-pm folks; 

Only through the number of in-use RX/TX queue entries you can correctly 
set the processor frequency (for this use case where only the network
processing is being performed by the machine).

>  one obvious downside is that any application that you
> run after DPDK will have its CPU frequency hardcoded to something that
> is not appropriate.  

To isolate the CPU where DPDK runs it is already necessary to perform
special procedures such as changing the cpumask of other tasks, changing
cpumask of interrupt handlers (to remove the isolated CPU from that
cpumask), etc. Changing the cpufreq governor to userspace is another
step of that setup phase.

On shutdown (or CPU unpin), you can switch back the CPU to the previous
governor, which can switch the frequency to whatever it finds suitable.

> This might be acceptable for DPDK, but it is worse
> for KVM which tries to provide isolation to its vCPU tasks.

Well in this case you know the only program which executes
on the CPU is handling of network packets and therefore you allow
that program to control the frequency.

> Here are two possibilities that I could think of:
> 
> 1) Introduce a mechanism that allows a task to override the governor's
> choice of CPU frequency.  This could be a ioctl, a prctl, a cgroup-based
> mechanism or whatever else.  As Marcelo pointed out in the original kvm@
> thread, the latency and overhead of switching frequencies make it
> impractical to associate a desired CPU frequency with a task, because
> multiple tasks could be requesting a given frequency.  One possibility
> could be to treat the per-task CPU frequency as advisory

DPDK can't afford the frequency as advisory: failure in setting the
processor frequency when requested means dropped packets (not 
dropping packets being a requirement).

>  and only obey
> it in restricted cases---for example only if nohz_full is in effect.

>From cpufreq documentation:

"On all other cpufreq implementations, these boundaries still need to
be set. Then, a "governor" must be selected. Such a "governor" decides
what speed the processor shall run within the boundaries. One such
"governor" is the "userspace" governor. This one allows the user - or
a yet-to-implement userspace program - to decide what specific speed
the processor shall run at."

(it seems the cpufreq-hypercall+cpufreq-userspace combination is in 
accord with what cpufreq-userspace has been designed for).

Secondly, setting frequencies for multiple tasks is somewhat
contradictory:

In the DPDK context, or in any context actually, it makes sense for a
program to lower processor frequency when it decides the current 
frequency is sufficient to handle the job: that is lowering the
frequency will still make it possible to handle the load.

With multiple applications sharing that processor, the percentage 
of time given to a certain application also interferes with the
time it spends handling the job. So the other variable that 
affects "instructions per second" is timeslice given to the
task by the scheduler, not only "frequency".

Having a task request for a particular frequency in that case becomes
ambiguous: you could be asking for "increased timeslice".







[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux