On Fri, Feb 03, 2017 at 06:40:34PM +0100, Radim Krcmar wrote: > 2017-02-02 15:47-0200, Marcelo Tosatti: > > Implement min/max/up/down frequency change > > KVM hypercalls. To be used by DPDK implementation. > > > > Also allow such hypercalls from guest userspace. > > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > > > --- > > Index: kvm-pvfreq/arch/x86/kvm/x86.c > > =================================================================== > > --- kvm-pvfreq.orig/arch/x86/kvm/x86.c 2017-02-02 11:17:17.063756725 -0200 > > +++ kvm-pvfreq/arch/x86/kvm/x86.c 2017-02-02 11:17:17.822752510 -0200 > > @@ -6219,10 +6219,58 @@ > > [Here lived copy-paste.] > > > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > > { > > unsigned long nr, a0, a1, a2, a3, ret; > > int op_64_bit, r; > > + bool cpl_check; > > > > r = kvm_skip_emulated_instruction(vcpu); > > > > @@ -6246,7 +6294,13 @@ > > a3 &= 0xFFFFFFFF; > > } > > > > - if (kvm_x86_ops->get_cpl(vcpu) != 0) { > > + cpl_check = true; > > + if (nr == KVM_HC_FREQ_UP || nr == KVM_HC_FREQ_DOWN || > > + nr == KVM_HC_FREQ_MIN || nr == KVM_HC_FREQ_MAX) > > + if (vcpu->arch.allow_freq_hypercall == true) > > + cpl_check = false; > > + > > + if (cpl_check == true && kvm_x86_ops->get_cpl(vcpu) != 0) { > > ret = -KVM_EPERM; > > goto out; > > } > > @@ -6262,6 +6316,21 @@ > > case KVM_HC_CLOCK_PAIRING: > > ret = kvm_pv_clock_pairing(vcpu, a0, a1); > > break; > > +#ifdef CONFIG_CPU_FREQ_GOV_USERSPACE > > CONFIG_CPU_FREQ_GOV_USERSPACE should be checked when enabling the > capability. > > > + case KVM_HC_FREQ_UP: > > + ret = kvm_pvfreq_up(vcpu); > > + break; > > + case KVM_HC_FREQ_DOWN: > > + ret = kvm_pvfreq_down(vcpu); > > + break; > > + case KVM_HC_FREQ_MAX: > > + ret = kvm_pvfreq_max(vcpu); > > + break; > > + case KVM_HC_FREQ_MIN: > > + ret = kvm_pvfreq_min(vcpu); > > + break; > > Having 4 hypercalls for this is an overkill. > You can make it one hypercall with an argument. Fine. > And the argument doesn't have to be enum {UP, DOWN, MAX, MIN}, but an > int, which would also allow you to do -2 steps. Are you suggesting to have an integer to signify the number of steps up or down. > A number over the capabilites of stepping would just map to MAX/MIN. Then MAX == any positive value above the number of steps MIN == any negative value below the negative of number of steps Sure. > Avoiding an absolute scale for interface simplifies migration, where the > guest cannot really depend much on this. Except that calling it with > MIN (INT_MIN) will get the minimum and MAX (INT_MAX) the maximum > frequency. Are you suggesting for the hypercall to return the maximum/minimum frequency if called with the highest integer and lowest negative integer respectively? (That same hypercall). Sure. > Plese explictly say in documentation that things like the number of > steps, which the guest can learn by doing MAX and then -1 until the > hypercall fails, is undefined and should not be depended upon. Sure, because it fails over migration. > Userspace might still want know the number of steps to avoid useless > hypercall -- I think we should return a different value when the limit > is reached, not just after the guest wants to go past it. Are you suggesting to return a different value when going from max-1 -> max and min+1 -> min frequencies? Fine. > > +#endif > > + > > default: > > ret = -KVM_ENOSYS; > > break; > > And thinking more about migration, userspace cannot learn the current > frequency (at least MIN/MAX), so the new host will just pick at random, > which will break userspace's expectations that it cannot increase or > decrease the frequency. Is migration left for the future, because DPDK > doesn't migrate anyway? > > Thanks. The new host should start with the highest frequency always. Then the frequency tuning algorithm can reduce frequency afterwards. Migration is a desired feature for DPDK, so it should be supported (thats one reason why virtio-net drivers are used in the guest BTW).