Re: [patch 0/3] KVM CPU frequency change hypercalls (resend)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 02/03/2017 14:59, Marcelo Tosatti wrote:
> On Thu, Mar 02, 2017 at 11:15:00AM +0100, Paolo Bonzini wrote:
>>  one obvious downside is that any application that you
>> run after DPDK will have its CPU frequency hardcoded to something that
>> is not appropriate.  
> 
> To isolate the CPU where DPDK runs it is already necessary to perform
> special procedures such as changing the cpumask of other tasks, changing
> cpumask of interrupt handlers (to remove the isolated CPU from that
> cpumask), etc. Changing the cpufreq governor to userspace is another
> step of that setup phase.
> 
> On shutdown (or CPU unpin), you can switch back the CPU to the previous
> governor, which can switch the frequency to whatever it finds suitable.

But I thought that one of the reasons to do NFV is to simplify this
setup.  If you now have to do the same thing on virtual machines, things
become more complicated to set up, and I don't think that NFV virtual
machines are _that_ special.

In addition, in the list of setup steps above you forgot "chmod the
sysfs files for cpufreq so that DPDK can access it".  Doing that chmod
is a very explicit act, and that's unlike the functionality of this patch.

By letting virtual machines do the same with a simple hypercall, you're
giving powers to whoever opens /dev/kvm that they didn't have before
(unless the userspace process also had access to sysfs).  Worse, the
effects last beyond the moment /dev/kvm is closed.

So, the question then is how to design the hypervisor so that these NFV
virtual machines can play with cpufreq, but there are no adverse
indefinite effects.  One possibility is to have some kind of per-task
cpufreq.  Another is to do everything in userspace with virtual ACPI
P-states and the userspace governor in the VM.

I was hoping to get more feedback from linux-pm.

>> Here are two possibilities that I could think of:
>>
>> 1) Introduce a mechanism that allows a task to override the governor's
>> choice of CPU frequency.  This could be a ioctl, a prctl, a cgroup-based
>> mechanism or whatever else.  As Marcelo pointed out in the original kvm@
>> thread, the latency and overhead of switching frequencies make it
>> impractical to associate a desired CPU frequency with a task, because
>> multiple tasks could be requesting a given frequency.  One possibility
>> could be to treat the per-task CPU frequency as advisory
> 
> DPDK can't afford the frequency as advisory: failure in setting the
> processor frequency when requested means dropped packets (not 
> dropping packets being a requirement).

It can be advisory if you document a proper configuration where it's obeyed.

Paolo

>>  and only obey
>> it in restricted cases---for example only if nohz_full is in effect.
> 
> From cpufreq documentation:
> 
> "On all other cpufreq implementations, these boundaries still need to
> be set. Then, a "governor" must be selected. Such a "governor" decides
> what speed the processor shall run within the boundaries. One such
> "governor" is the "userspace" governor. This one allows the user - or
> a yet-to-implement userspace program - to decide what specific speed
> the processor shall run at."
> 
> (it seems the cpufreq-hypercall+cpufreq-userspace combination is in 
> accord with what cpufreq-userspace has been designed for).
> 
> Secondly, setting frequencies for multiple tasks is somewhat
> contradictory:
> 
> In the DPDK context, or in any context actually, it makes sense for a
> program to lower processor frequency when it decides the current 
> frequency is sufficient to handle the job: that is lowering the
> frequency will still make it possible to handle the load.
> 
> With multiple applications sharing that processor, the percentage 
> of time given to a certain application also interferes with the
> time it spends handling the job. So the other variable that 
> affects "instructions per second" is timeslice given to the
> task by the scheduler, not only "frequency".
> 
> Having a task request for a particular frequency in that case becomes
> ambiguous: you could be asking for "increased timeslice".



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux