On Mon, Jun 10, 2019 at 5:00 PM Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > On Fri, Jun 07, 2019 at 11:49:51AM +0200, Rafael J. Wysocki wrote: > > On 6/4/2019 12:52 AM, Marcelo Tosatti wrote: > > >The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified > > >amount of time before halting. This provides the following benefits > > >to host side polling: > > > > > > 1) The POLL flag is set while polling is performed, which allows > > > a remote vCPU to avoid sending an IPI (and the associated > > > cost of handling the IPI) when performing a wakeup. > > > > > > 2) The HLT VM-exit cost can be avoided. > > > > > >The downside of guest side polling is that polling is performed > > >even with other runnable tasks in the host. > > > > > >Results comparing halt_poll_ns and server/client application > > >where a small packet is ping-ponged: > > > > > >host --> 31.33 > > >halt_poll_ns=300000 / no guest busy spin --> 33.40 (93.8%) > > >halt_poll_ns=0 / guest_halt_poll_ns=300000 --> 32.73 (95.7%) > > > > > >For the SAP HANA benchmarks (where idle_spin is a parameter > > >of the previous version of the patch, results should be the > > >same): > > > > > >hpns == halt_poll_ns > > > > > > idle_spin=0/ idle_spin=800/ idle_spin=0/ > > > hpns=200000 hpns=0 hpns=800000 > > >DeleteC06T03 (100 thread) 1.76 1.71 (-3%) 1.78 (+1%) > > >InsertC16T02 (100 thread) 2.14 2.07 (-3%) 2.18 (+1.8%) > > >DeleteC00T01 (1 thread) 1.34 1.28 (-4.5%) 1.29 (-3.7%) > > >UpdateC00T03 (1 thread) 4.72 4.18 (-12%) 4.53 (-5%) > > > > > >V2: > > > > > >- Move from x86 to generic code (Paolo/Christian). > > >- Add auto-tuning logic (Paolo). > > >- Add MSR to disable host side polling (Paolo). > > > > > > > > > > > First of all, please CC power management patches (including cpuidle, > > cpufreq etc) to linux-pm@xxxxxxxxxxxxxxx (there are people on that > > list who may want to see your changes before they go in) and CC > > cpuidle material (in particular) to Peter Zijlstra. > > > > Second, I'm not a big fan of this approach to be honest, as it kind > > of is a driver trying to play the role of a governor. > > > > We have a "polling state" already that could be used here in > > principle so I wonder what would be wrong with that. Also note that > > there seems to be at least some code duplication between your code > > and the "polling state" implementation, so maybe it would be > > possible to do some things in a common way? > > Hi Rafael, > > After modifying poll_state.c to use a generic "poll time" driver > callback [1] (since using a variable "target_residency" for that > looks really ugly), would need a governor which does: > > haltpoll_governor_select_next_state() > if (prev_state was poll and evt happened on prev poll window) -> POLL. > if (prev_state == HLT) -> POLL > otherwise -> HLT > > And a "default_idle" cpuidle driver that: > > defaultidle_idle() > if (current_clr_polling_and_test()) { > local_irq_enable(); > return index; > } > default_idle(); > return > > Using such governor with any other cpuidle driver would > be pointless (since it would enter the first state only > and therefore not save power). > > Not certain about using the default_idle driver with > other governors: one would rather use a driver that > supports all states on a given machine. > > This combination of governor/driver pair, for the sake > of sharing the idle loop, seems awkward to me. > And fails the governor/driver separation: one will use the > pair in practice. > > But i have no problem with it, so i'll proceed with that. > > Let me know otherwise. If my understanding of your argumentation is correct, it is only necessary to take the default_idle_call() branch of cpuidle_idle_call() in the VM case, so it should be sufficient to provide a suitable default_idle_call() which is what you seem to be trying to do. I might have been confused by the terminology used in the patch series if that's the case. Also, if that's the case, this is not cpuidle matter really. It is a matter of providing a better default_idle_call() for the arch at hand. Thanks, Rafael