Steal time accounts the time duration during which a guest vcpu was ready to run, but was not scheduled to run by the hypervisor. This is particularly relevant in cloud environment where customers would want to use this as an indicator that their guests are being throttled. However, as it stands today, guest steal time information is not visible from the hypervisor. For cloud service providers, this is problematic since they would want to overcommit cpu resources to achieve optimum resource utilization while at the same time ensuring guests are not throttled. It is useful for service providers to have access to the guest steal time data so that they can base their overcommit/guest packing decisions on this. Higher guest steal time can be used as a trigger to change how the guests are scheduled, or even migrate guests out of a system. This patchset attempts to make the guest steal times available in the host. This is achieved by introducing a new field in per-task statistics (/proc/<pid>/stat and /proc/<pid>/task/<pid>/stat) to accumulate per-vcpu steal time. Programs (such as pidstat) can then be enhanced to report this information on a per-thread basis [If there is a better place/way to expose this, please let me know]. As an example, with pidstat on ppc64: Guest steal time information using mpstat: ----------------------------------------- [root@rhel7-img ~]# mpstat -P ALL 1 Linux 3.19.0nnr (rhel7-img) 04/15/2015 _ppc64_ (4 CPU) 03:13:23 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 03:13:24 PM all 12.25 0.00 1.25 0.00 1.00 2.25 13.75 0.00 0.00 69.50 03:13:24 PM 0 46.53 0.00 0.00 0.00 0.00 4.95 45.54 0.00 0.00 2.97 03:13:24 PM 1 0.00 0.00 0.00 0.00 0.00 4.04 3.03 0.00 0.00 92.93 03:13:24 PM 2 0.00 0.00 0.00 0.00 3.96 0.99 2.97 0.00 0.00 92.08 03:13:24 PM 3 3.00 0.00 4.00 0.00 0.00 0.00 4.00 0.00 0.00 89.00 03:13:24 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 03:13:25 PM all 12.59 0.00 0.00 0.00 0.00 0.25 12.35 0.00 0.00 74.81 03:13:25 PM 0 50.00 0.00 0.00 0.00 0.00 0.98 49.02 0.00 0.00 0.00 03:13:25 PM 1 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.02 03:13:25 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 03:13:25 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 03:13:25 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 03:13:26 PM all 12.99 0.00 0.00 0.00 0.25 0.00 12.75 0.00 0.00 74.02 03:13:26 PM 0 51.96 0.00 0.00 0.00 0.00 0.00 48.04 0.00 0.00 0.00 03:13:26 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 03:13:26 PM 2 0.00 0.00 0.00 0.00 0.98 0.00 2.94 0.00 0.00 96.08 03:13:26 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 03:13:26 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 03:13:27 PM all 12.53 0.00 1.00 0.25 0.00 0.25 12.03 0.00 0.00 73.93 03:13:27 PM 0 51.02 0.00 0.00 0.00 0.00 0.00 48.98 0.00 0.00 0.00 03:13:27 PM 1 0.00 0.00 4.04 0.00 0.00 0.00 0.00 0.00 0.00 95.96 03:13:27 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 03:13:27 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 12.91 0.00 0.54 0.01 0.04 0.12 12.39 0.00 0.00 74.00 Average: 0 51.36 0.00 0.03 0.00 0.03 0.26 48.27 0.00 0.00 0.05 Average: 1 0.02 0.00 1.54 0.02 0.02 0.15 0.36 0.00 0.00 97.89 Average: 2 0.00 0.00 0.52 0.00 0.09 0.02 0.36 0.00 0.00 99.02 Average: 3 0.05 0.00 0.07 0.00 0.02 0.09 0.34 0.00 0.00 99.43 Steal time information in host using (locally modified) pidstat: --------------------------------------------------------------- [naveen@xxxxxxxxxx sysstat]$ ./pidstat -C qemu -tIu 1 Linux 3.19.0nnr (xxxxxxxxxx.in.ibm.com) 04/15/2015 _ppc64_ (64 CPU) 04:43:20 AM UID TGID TID %usr %system %guest %CPU %steal CPU Command 04:43:22 AM 1008 3001 - 0.00 0.00 54.21 3.39 45.79 12 qemu-system-ppc 04:43:22 AM 1008 - 3005 0.00 0.00 54.21 3.39 0.00 12 |__qemu-system-ppc 04:43:22 AM UID TGID TID %usr %system %guest %CPU %steal CPU Command 04:43:23 AM 1008 3001 - 0.00 0.00 52.00 3.25 46.00 12 qemu-system-ppc 04:43:23 AM 1008 - 3003 0.00 0.00 2.00 0.12 46.00 12 |__qemu-system-ppc 04:43:23 AM 1008 - 3005 0.00 0.00 45.00 2.81 0.00 12 |__qemu-system-ppc 04:43:23 AM 1008 - 3006 0.00 0.00 6.00 0.38 0.00 12 |__qemu-system-ppc 04:43:23 AM UID TGID TID %usr %system %guest %CPU %steal CPU Command 04:43:24 AM 1008 3001 - 0.00 2.00 50.00 3.25 67.00 12 qemu-system-ppc 04:43:24 AM 1008 - 3001 0.00 1.00 0.00 0.06 0.00 12 |__qemu-system-ppc 04:43:24 AM 1008 - 3003 0.00 0.00 8.00 0.50 49.00 12 |__qemu-system-ppc 04:43:24 AM 1008 - 3004 0.00 0.00 2.00 0.12 5.00 12 |__qemu-system-ppc 04:43:24 AM 1008 - 3005 0.00 0.00 38.00 2.38 3.00 12 |__qemu-system-ppc 04:43:24 AM 1008 - 3006 0.00 1.00 0.00 0.06 8.00 12 |__qemu-system-ppc 04:43:24 AM UID TGID TID %usr %system %guest %CPU %steal CPU Command 04:43:25 AM 1008 3001 - 0.00 0.00 51.00 3.19 47.00 12 qemu-system-ppc 04:43:25 AM 1008 - 3003 0.00 0.00 27.00 1.69 47.00 12 |__qemu-system-ppc 04:43:25 AM 1008 - 3004 0.00 1.00 0.00 0.06 0.00 12 |__qemu-system-ppc 04:43:25 AM 1008 - 3005 0.00 1.00 23.00 1.50 0.00 12 |__qemu-system-ppc 04:43:25 AM 1008 - 3006 0.00 0.00 2.00 0.12 0.00 12 |__qemu-system-ppc 04:43:25 AM UID TGID TID %usr %system %guest %CPU %steal CPU Command 04:43:26 AM 1008 3001 - 0.00 0.00 51.00 3.18 53.00 12 qemu-system-ppc 04:43:26 AM 1008 - 3003 0.00 0.00 9.00 0.56 50.00 12 |__qemu-system-ppc 04:43:26 AM 1008 - 3005 0.00 0.00 16.00 1.00 3.00 12 |__qemu-system-ppc 04:43:26 AM 1008 - 3006 0.00 0.00 26.00 1.62 0.00 12 |__qemu-system-ppc Average: UID TGID TID %usr %system %guest %CPU %steal CPU Command Average: 1008 3001 - 0.00 0.18 51.54 3.23 50.12 - qemu-system-ppc Average: 1008 - 3001 0.02 0.02 0.00 0.00 0.00 - |__qemu-system-ppc Average: 1008 - 3003 0.00 0.03 15.89 0.99 48.24 - |__qemu-system-ppc Average: 1008 - 3004 0.00 0.05 11.70 0.73 0.56 - |__qemu-system-ppc Average: 1008 - 3005 0.00 0.06 20.03 1.26 0.58 - |__qemu-system-ppc Average: 1008 - 3006 0.00 0.03 3.93 0.25 0.72 - |__qemu-system-ppc On x86, we can obtain accurate steal time information since it is just the scheduler run_delay. However, on powerpc, obtaining accurate steal time information is challenging. This patchset proposes a technique that allows us to obtain a reasonable (+/- 5%) approximation. Please suggest if there are better ways to achieve more accurate steal time accounting in the hypervisor. I am also interested in general feedback on the overall patchset and my approach for the same. Thanks! - Naveen Naveen N. Rao (3): procfs: add guest steal time in /proc/<pid>/stat kvm/x86: report guest steal time in host kvm/powerpc: report guest steal time in host arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/book3s_hv.c | 2 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 3 +++ arch/x86/kvm/x86.c | 1 + fs/proc/array.c | 6 ++++++ include/linux/sched.h | 7 +++++++ kernel/fork.c | 2 +- 8 files changed, 22 insertions(+), 1 deletion(-) -- 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html