On Wed, Mar 01, 2017 at 03:21:32PM +0100, Paolo Bonzini wrote: > > > On 28/02/2017 03:45, Marcelo Tosatti wrote: > > On Fri, Feb 24, 2017 at 04:34:52PM +0100, Paolo Bonzini wrote: > >> > >> > >> On 24/02/2017 14:04, Marcelo Tosatti wrote: > >>>>>>> Whats the current usecase, or forseeable future usecase, for save/restore > >>>>>>> across preemption again? (which would validate the broken by design > >>>>>>> claim). > >>>>>> Stop a guest that is using cpufreq, start a guest that is not using it. > >>>>>> The second guest's performance now depends on the state that the first > >>>>>> guest left in cpufreq. > >>>>> Nothing forbids the host to implement switching with the > >>>>> current hypercall interface: all you need is a scheduler > >>>>> hook. > >>>> Can it be done in vcpu_load/vcpu_put? But you still would have two > >>>> components (KVM and sysfs) potentially fighting over the frequency, and > >>>> that's still a bit ugly. > >>> > >>> Change the frequency at vcpu_load/vcpu_put? Yes: call into > >>> cpufreq-userspace. But there is no notion of "per-task frequency" on the > >>> Linux kernel (which was the starting point of this subthread). > >> > >> There isn't, but this patchset is providing a direct path from a task to > >> cpufreq-userspace. This is as close as you can get to a per-task frequency. > > > > Cpufreq-userspace is supposed to be used by tasks in userspace. > > Thats why its called "userspace". > > I think the intended usecase is to have a daemon handling a systemwide > policy. Examples are the historical (and now obsolete) users such as > cpufreqd, cpudyn, powernowd, or cpuspeed. The user alternatively can > play the role of the daemon by writing to sysfs. > > I've never seen userspace tasks talking to cpufreq-userspace to set > their own running frequency. If DPDK does it, that's nasty in my > opinion Please extend what "nasty" means in detail. I really don't understand why its nasty. > and we should find an interface that works best for both DPDK > and KVM. Which should be done on linux-pm like Rafael suggested. > > >>> But if you configure all CPUs in the system as cpufreq-userspace, > >>> then some other (userspace program) has to decide the frequency > >>> for the other CPUs. > >>> > >>> Which agent would do that and why? Thats why i initially said "whats the > >>> usecase". > >> > >> You could just pin them at the highest non-TurboBoost frequency until a > >> guest runs. That's assuming that they are idle and, because of > >> isol_cpus/nohz_full, they would be almost always in deep C state anyway. > > > > The original claim of the thread was: "this feature (frequency > > hypercalls) works for pinned vcpu<->pcpu, pcpu dedicated exclusively > > to vcpu case, lets try to extend this to other cases". > > > > Which is a valid and useful direction to go. > > > > However there is no user for multiple vcpus in the same pcpu now. > > You are still ignoring the case of one guest started after another, or > of another program started on a CPU that formerly was used by KVM. They > don't have to be multiple users at the same time. Just have the cpufreq-userspace policy be instantiated while the isolated vcpu owns the pcpu. Before/after that, the previous policy is in place. > > If there were multiple vcpus, all of them requesting a given > > frequency, it would be necessary to: > > > > 1) Maintain frequency of the pcpu to the highest > > frequencies. > > > > OR > > > > 2) Since switching frequencies can take up to 70us (*) > > (depends on processor), its generally not worthwhile > > to switch frequencies between task switches. > > Is latency that important, or is rather overhead the one to pay > attention to? The slides you linked > (http://www.ena-hpc.org/2013/pdf/04.pdf) at page 17 suggest it's around > 10us. Ok, be it 10us. 10us overhead on every task context switch is not acceptable. > One possibility is to do (1) if you have multiple tasks on the run queue > (or fallback to what is specified in sysfs) and (2) if you only have one > task. Sure, that is alright. But the use-case at hand does not involve multiple tasks on the pcpu. > Anyway, please repost with Cc to linux-pm so that we can restart the > discussion there. > > Paolo Done. Can you please reply with a concise summary of what you object to?