On 8/29/19 9:22 PM, Daniel Lezcano wrote: > On 29/08/2019 21:11, Joao Martins wrote: >> On 8/29/19 7:28 PM, Daniel Lezcano wrote: >>> On 29/08/2019 20:07, Joao Martins wrote: >>>> On 8/29/19 6:42 PM, Daniel Lezcano wrote: >>>>> On 29/08/2019 19:16, Joao Martins wrote: >>>>>> On 8/29/19 4:10 PM, Joao Martins wrote: >>>>>>> When cpus != maxcpus cpuidle-haltpoll will fail to register all vcpus >>>>>>> past the online ones and thus fail to register the idle driver. >>>>>>> This is because cpuidle_add_sysfs() will return with -ENODEV as a >>>>>>> consequence from get_cpu_device() return no device for a non-existing >>>>>>> CPU. >>>>>>> >>>>>>> Instead switch to cpuidle_register_driver() and manually register each >>>>>>> of the present cpus through cpuhp_setup_state() callback and future >>>>>>> ones that get onlined. This mimmics similar logic that intel_idle does. >>>>>>> >>>>>>> Fixes: fa86ee90eb11 ("add cpuidle-haltpoll driver") >>>>>>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> >>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> >>>>>>> --- >>>>>> >>>>>> While testing the above, I found out another issue on the haltpoll series. >>>>>> But I am not sure what is best suited to cpuidle framework, hence requesting >>>>>> some advise if below is a reasonable solution or something else is preferred. >>>>>> >>>>>> Essentially after haltpoll governor got introduced and regardless of the cpuidle >>>>>> driver the default governor is gonna be haltpoll for a guest (given haltpoll >>>>>> governor doesn't get registered for baremetal). Right now, for a KVM guest, the >>>>>> idle governors have these ratings: >>>>>> >>>>>> * ladder -> 10 >>>>>> * teo -> 19 >>>>>> * menu -> 20 >>>>>> * haltpoll -> 21 >>>>>> * ladder + nohz=off -> 25 >>>>>> >>>>>> When a guest is booted with MWAIT and intel_idle is probed and sucessfully >>>>>> registered, we will end up with a haltpoll governor being used as opposed to >>>>>> 'menu' (which used to be the default case). This would prevent IIUC that other >>>>>> C-states get used other than poll_state (state 0) and state 1. >>>>>> >>>>>> Given that haltpoll governor is largely only useful with a cpuidle-haltpoll >>>>>> it doesn't look reasonable to be the default? What about using haltpoll governor >>>>>> as default when haltpoll idle driver registers or modload. >>>>> >>>>> Are the guest and host kernel the same? IOW compiled with the same >>>>> kernel config? >>>>> >>>> You just need to toggle this (regardless off CONFIG_HALTPOLL_CPUIDLE): >>>> >>>> CONFIG_CPU_IDLE_GOV_HALTPOLL=y >>>> >>>> And *if you are a KVM guest* it will be the default (unless using nohz=off in >>>> which case ladder gets the highest rating -- see the listing right above). >>>> >>>> Host will just behave differently because the haltpoll governor is checking if >>>> it is running as kvm guest, and only registering in that case. >>> >>> I understood the problem. Actually my question was about if the kernels >>> are compiled for host and guest, and can be run indifferently. >> >> /nods Correct. >> >>> In this >>> case a runtime detection must be done as you propose, otherwise that can >>> be done at config time. I pretty sure it is the former but before >>> thinking about the runtime side, I wanted to double check. >>> >> Hmm, but even with separate kernels/configs for guest and host I think we would >> still have the same issue. >> >> What I was trying to convey is that even when running with a config solely for >> KVM guests (that is different than baremetal) you can have today various ways of >> idling. An Intel x86 kvm guest can have no idle driver (but arch-specific), >> intel_idle (like baremetal config) and haltpoll. There are usecases for these >> three, and makes sense to consolidate all. >> >> Say you wanted to have a kvm specific config, you would still see the same >> problem if you happen to compile intel_idle together with haltpoll >> driver+governor. > > Can a guest work with an intel_idle driver? > Yes. If you use Qemu you would add '-overcommit cpu-pm=on' to try it out. ofc, assuming you're on a relatively recent Qemu (v3.0+) and a fairly recent kernel version as host (v4.17+). >> Creating two separate configs here, with and without haltpoll >> for VMs doesn't sound effective for distros. > > Agree > >> Perhaps decreasing the rating of >> haltpoll governor, but while a short term fix it wouldn't give much sensible >> defaults without the one-off runtime switch. >