Re: [PATCH 6/6] doc/kvm: Add halt polling documentation

Wanpeng Li <kernellwp@xxxxxxxxx> · Fri, 14 Oct 2016 09:16:12 +0800

2016-10-14 8:53 GMT+08:00 Suraj Jitindar Singh <sjitindarsingh@xxxxxxxxx>:
> There is currently no documentation about the halt polling capabilities
> of the kvm module. Add some documentation describing the mechanism as well
> as the module parameters to all better understanding of how halt polling
> should be used and the effect of tuning the module parameters.

How about replace "halt-polling" by "Adaptive halt-polling"? Btw,
thanks for your docs.

Regards,
Wanpeng Li

>
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@xxxxxxxxx>
> ---
>  Documentation/virtual/kvm/00-INDEX         |   2 +
>  Documentation/virtual/kvm/halt-polling.txt | 127 +++++++++++++++++++++++++++++
>  2 files changed, 129 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/halt-polling.txt
>
> diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX
> index fee9f2b..69fe1a8 100644
> --- a/Documentation/virtual/kvm/00-INDEX
> +++ b/Documentation/virtual/kvm/00-INDEX
> @@ -6,6 +6,8 @@ cpuid.txt
>         - KVM-specific cpuid leaves (x86).
>  devices/
>         - KVM_CAP_DEVICE_CTRL userspace API.
> +halt-polling.txt
> +       - notes on halt-polling
>  hypercalls.txt
>         - KVM hypercalls.
>  locking.txt
> diff --git a/Documentation/virtual/kvm/halt-polling.txt b/Documentation/virtual/kvm/halt-polling.txt
> new file mode 100644
> index 0000000..4a84183
> --- /dev/null
> +++ b/Documentation/virtual/kvm/halt-polling.txt
> @@ -0,0 +1,127 @@
> +The KVM halt polling system
> +===========================
> +
> +The KVM halt polling system provides a feature within KVM whereby the latency
> +of a guest can, under some circumstances, be reduced by polling in the host
> +for some time period after the guest has elected to no longer run by cedeing.
> +That is, when a guest vcpu has ceded, or in the case of powerpc when all of the
> +vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
> +before giving up the cpu to the scheduler in order to let something else run.
> +
> +Polling provides a latency advantage in cases where the guest can be run again
> +very quickly by at least saving us a trip through the scheduler, normally on
> +the order of a few micro-seconds, although performance benefits are workload
> +dependant. In the event that no wakeup source arrives during the polling
> +interval or some other task on the runqueue is runnable the scheduler is
> +invoked. Thus halt polling is especially useful on workloads with very short
> +wakeup periods where the time spent halt polling is minimised and the time
> +savings of not invoking the scheduler are distinguishable.
> +
> +The generic halt polling code is implemented in:
> +
> +       virt/kvm/kvm_main.c: kvm_vcpu_block()
> +
> +The powerpc kvm-hv specific case is implemented in:
> +
> +       arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
> +
> +Halt Polling Interval
> +=====================
> +
> +The maximum time for which to poll before invoking the scheduler, referred to
> +as the halt polling interval, is increased and decreased based on the perceived
> +effectiveness of the polling in an attempt to limit pointless polling.
> +This value is stored in either the vcpu struct:
> +
> +       kvm_vcpu->halt_poll_ns
> +
> +or in the case of powerpc kvm-hv, in the vcore struct:
> +
> +       kvmppc_vcore->halt_poll_ns
> +
> +Thus this is a per vcpu (or vcore) value.
> +
> +During polling if a wakeup source is received within the halt polling interval,
> +the interval is left unchanged. In the event that a wakeup source isn't
> +received during the polling interval (and thus schedule is invoked) there are
> +two options, either the polling interval and total block time[0] were less than
> +the global max polling interval (see module params below), or the total block
> +time was greater than the global max polling interval.
> +
> +In the event that both the polling interval and total block time were less than
> +the global max polling interval then the polling interval can be increased in
> +the hope that next time during the longer polling interval the wake up source
> +will be received while the host is polling and the latency benefits will be
> +received. The polling interval is grown in the function grow_halt_poll_ns() and
> +is multiplied by the module parameter halt_poll_ns_grow.
> +
> +In the event that the total block time was greater than the global max polling
> +interval then the host will never poll for long enough (limited by the global
> +max) to wakeup during the polling interval so it may as well be shrunk in order
> +to avoid pointless polling. The polling interval is shrunk in the function
> +shrink_halt_poll_ns() and is divided by the module parameter
> +halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
> +
> +It is worth noting that this adjustment process attempts to hone in on some
> +steady state polling interval but will only really do a good job for wakeups
> +which come at an approximately constant rate, otherwise there will be constant
> +adjustment of the polling interval.
> +
> +[0] total block time: the time between when the halt polling function is
> +                     invoked and a wakeup source received (irrespective of
> +                     whether the scheduler is invoked within that function).
> +
> +Module Parameters
> +=================
> +
> +The kvm module has 3 tuneable module parameters to adjust the global max
> +polling interval as well as the rate at which the polling interval is grown and
> +shrunk. These variables are defined in include/linux/kvm_host.h and as module
> +parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
> +powerpc kvm-hv case.
> +
> +Module Parameter    |       Description              |      Default Value
> +--------------------------------------------------------------------------------
> +halt_poll_ns       | The global max polling interval | KVM_HALT_POLL_NS_DEFAULT
> +                   | which defines the ceiling value |
> +                   | of the polling interval for     | (per arch value)
> +                   | each vcpu.                      |
> +--------------------------------------------------------------------------------
> +halt_poll_ns_grow   | The value by which the halt     |        2
> +                   | polling interval is multiplied  |
> +                   | in the grow_halt_poll_ns()      |
> +                   | function.                       |
> +--------------------------------------------------------------------------------
> +halt_poll_ns_shrink | The value by which the halt     |        0
> +                   | polling interval is divided in  |
> +                   | the shrink_halt_poll_ns()       |
> +                   | function.                       |
> +--------------------------------------------------------------------------------
> +
> +These module parameters can be set from the debugfs files in:
> +
> +       /sys/module/kvm/parameters/
> +
> +Note: that these module parameters are system wide values and are not able to
> +      be tuned on a per vm basis.
> +
> +Further Notes
> +=============
> +
> +- Care should be taken when setting the halt_poll_ns module parameter as a
> +large value has the potential to drive the cpu usage to 100% on a machine which
> +would be almost entirely idle otherwise. This is because even if a guest has
> +wakeups during which very little work is done and which are quite far apart, if
> +the period is shorter than the global max polling interval (halt_poll_ns) then
> +the host will always poll for the entire block time and thus cpu utilisation
> +will go to 100%.
> +
> +- Halt polling essentially presents a trade off between power usage and latency
> +and the module parameters should be used to tune the affinity for this. Idle
> +cpu time is essentially converted to host kernel time with the aim of decreasing
> +latency when entering the guest.
> +
> +- Halt polling will only be conducted by the host when no other tasks are
> +runnable on that cpu, otherwise the polling will cease immediately and
> +schedule will be invoked to allow that other task to run. Thus this doesn't
> +allow a guest to denial of service the cpu.
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html