On Tue, Jun 11, 2019 at 4:27 PM Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > On Tue, Jun 11, 2019 at 12:03:26AM +0200, Rafael J. Wysocki wrote: > > On Mon, Jun 10, 2019 at 5:00 PM Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > > > > > On Fri, Jun 07, 2019 at 11:49:51AM +0200, Rafael J. Wysocki wrote: > > > > On 6/4/2019 12:52 AM, Marcelo Tosatti wrote: > > > > >The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified > > > > >amount of time before halting. This provides the following benefits > > > > >to host side polling: > > > > > > > > > > 1) The POLL flag is set while polling is performed, which allows > > > > > a remote vCPU to avoid sending an IPI (and the associated > > > > > cost of handling the IPI) when performing a wakeup. > > > > > > > > > > 2) The HLT VM-exit cost can be avoided. > > > > > > > > > >The downside of guest side polling is that polling is performed > > > > >even with other runnable tasks in the host. > > > > > > > > > >Results comparing halt_poll_ns and server/client application > > > > >where a small packet is ping-ponged: > > > > > > > > > >host --> 31.33 > > > > >halt_poll_ns=300000 / no guest busy spin --> 33.40 (93.8%) > > > > >halt_poll_ns=0 / guest_halt_poll_ns=300000 --> 32.73 (95.7%) > > > > > > > > > >For the SAP HANA benchmarks (where idle_spin is a parameter > > > > >of the previous version of the patch, results should be the > > > > >same): > > > > > > > > > >hpns == halt_poll_ns > > > > > > > > > > idle_spin=0/ idle_spin=800/ idle_spin=0/ > > > > > hpns=200000 hpns=0 hpns=800000 > > > > >DeleteC06T03 (100 thread) 1.76 1.71 (-3%) 1.78 (+1%) > > > > >InsertC16T02 (100 thread) 2.14 2.07 (-3%) 2.18 (+1.8%) > > > > >DeleteC00T01 (1 thread) 1.34 1.28 (-4.5%) 1.29 (-3.7%) > > > > >UpdateC00T03 (1 thread) 4.72 4.18 (-12%) 4.53 (-5%) > > > > > > > > > >V2: > > > > > > > > > >- Move from x86 to generic code (Paolo/Christian). > > > > >- Add auto-tuning logic (Paolo). > > > > >- Add MSR to disable host side polling (Paolo). > > > > > > > > > > > > > > > > > > > First of all, please CC power management patches (including cpuidle, > > > > cpufreq etc) to linux-pm@xxxxxxxxxxxxxxx (there are people on that > > > > list who may want to see your changes before they go in) and CC > > > > cpuidle material (in particular) to Peter Zijlstra. > > > > > > > > Second, I'm not a big fan of this approach to be honest, as it kind > > > > of is a driver trying to play the role of a governor. > > > > > > > > We have a "polling state" already that could be used here in > > > > principle so I wonder what would be wrong with that. Also note that > > > > there seems to be at least some code duplication between your code > > > > and the "polling state" implementation, so maybe it would be > > > > possible to do some things in a common way? > > > > > > Hi Rafael, > > > > > > After modifying poll_state.c to use a generic "poll time" driver > > > callback [1] (since using a variable "target_residency" for that > > > looks really ugly), would need a governor which does: > > > > > > haltpoll_governor_select_next_state() > > > if (prev_state was poll and evt happened on prev poll window) -> POLL. > > > if (prev_state == HLT) -> POLL > > > otherwise -> HLT > > > > > > And a "default_idle" cpuidle driver that: > > > > > > defaultidle_idle() > > > if (current_clr_polling_and_test()) { > > > local_irq_enable(); > > > return index; > > > } > > > default_idle(); > > > return > > > > > > Using such governor with any other cpuidle driver would > > > be pointless (since it would enter the first state only > > > and therefore not save power). > > > > > > Not certain about using the default_idle driver with > > > other governors: one would rather use a driver that > > > supports all states on a given machine. > > > > > > This combination of governor/driver pair, for the sake > > > of sharing the idle loop, seems awkward to me. > > > And fails the governor/driver separation: one will use the > > > pair in practice. > > > > > > But i have no problem with it, so i'll proceed with that. > > > > > > Let me know otherwise. > > > > If my understanding of your argumentation is correct, it is only > > necessary to take the default_idle_call() branch of > > cpuidle_idle_call() in the VM case, so it should be sufficient to > > provide a suitable default_idle_call() which is what you seem to be > > trying to do. > > In the VM case, we need to poll before actually halting (this is because > its tricky to implement MWAIT in guests, so polling for some amount > of time allows the IPI avoidance optimization, > see trace_sched_wake_idle_without_ipi, to take place). > > The amount of time we poll is variable and adjusted (see adjust_haltpoll_ns > in the patchset). > > > I might have been confused by the terminology used in the patch series > > if that's the case. > > > > Also, if that's the case, this is not cpuidle matter really. It is a > > matter of providing a better default_idle_call() for the arch at hand. > > Peter Zijlstra suggested a cpuidle driver for this. So I wonder what his rationale was. > Also, other architectures will use the same "poll before exiting to VM" > logic (so we'd rather avoid duplicating this code): PPC, x86, S/390, > MIPS... So in my POV it makes sense to unify this. The logic is fine IMO, but the implementation here is questionable. > So, back to your initial suggestion: > > Q) "Can you unify code with poll_state.c?" > A) Yes, but it requires a new governor, which seems overkill and unfit > for the purpose. > > Moreover, the logic in menu to decide whether its necessary or not > to stop sched tick is useful for us (so a default_idle_call is not > sufficient), because the cost of enabling/disabling the sched tick is > high on VMs. So in fact you need a governor, but you really only need it to decide whether or not to stop the tick for you. menu has a quite high overhead for that. :-) > So i'll fix the comments of the cpuidle driver (which everyone seems > to agree with, except your understandable distate for it) and repost.