Re: Investigating abnormal stealtimes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nice!

- 'Steal time' is the amount of time taken while vcpu is able to run    
but not runnable. Maybe 'vmexit latency' is a better name.              

- Perhaps it would be good to subtract the time the thread was
involuntarily scheduled out due 'timeslice' expiration. Otherwise,
running a CPU intensive task returns false positives (that is, long
delays to due to reschedule due to 'timeslice' exhausted by guest CPU
activity, not due to KVM or QEMU issues such as voluntarily schedule in
pthread_mutex_lock).

Alternatively you can raise the priority of the vcpu threads (to get rid
of the false positives).

- Idea: Would be handy to extract trace events in the offending
'latency above threshold' vmexit/vmentry region.
Say that you enable other trace events (unrelated to kvm) which can
help identify the culprit. Instead of scanning the file manually 
searching for "100466.1062486786" save one vmexit/vmentry cycle,
along with other trace events in that period, in a separate file.

On Tue, Jan 29, 2013 at 01:27:12PM +0100, Stefan Hajnoczi wrote:
> Khoa and I have been discussing a workload that triggers softlockups and
> hung task warnings inside the guest.  These warnings can pop up due to
> bugs in the guest Linux kernel but they can also be triggered by the
> hypervisor if vcpus are not being scheduled at reasonable times.
> 
> I've wanted a tool that reports high stealtimes and includes the last
> vmexit reason.  This allows us to figure out if specific I/O device
> emulation is taking too long or if other factors like host memory
> pressure are degrading guest performance.
> 
> Here is a first sketch of such a tool.  It's a perf-script(1) Python
> script which can be used to analyze perf.data files recorded with
> kvm:kvm_entry and kvm:kvm_exit events.
> 
> Stealtimes exceeding a threshold will be flagged up:
> 
>   $ perf script -s /absolute/path/to/stealtime.py
>   100466.1062486786 9690: steal time 00000.029318914 secs,
>                           exit_reason IO_INSTRUCTION,
>                           guest_rip 0xffffffff81278f02,
>                           isa 1, info1 0xcf80003, info2 0x0
> 
> The example above shows an I/O access to 0xcf8 (PCI Configuration Space
> Address port) that took about 28 milliseconds.  The host pid was 9690; this
> can be used to investigate the QEMU vcpu thread.  The guest rip can be used
> to investigate guest code that triggered this vmexit.
> 
> Given this information, it becomes possible to debug QEMU to figure out
> why vmexit handling is taking too long.  It might be due to global mutex
> contention if another thread holds the global mutex while blocking.
> This sort of investigation needs to be done manually today but it might
> be possible to add perf event handlers to watch for global mutex
> contention inside QEMU and automatically identify the culprit.
> 
> Stalls inside the kvm kernel module can also be investigated since
> kvm:kvm_exit events are triggered when they happen too.
> 
> I wanted to share in case it is useful for others.  Suggestions for
> better approaches welcome!
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@xxxxxxxxxx>
> 
> ---
> 
> #!/usr/bin/env python
> # perf script event handlers, generated by perf script -g python
> # Licensed under the terms of the GNU GPL License version 2
> 
> # Script to print steal times longer than a given threshold
> #
> # To collect trace data:
> # $ perf record -a -e kvm:kvm_entry -e kvm:kvm_exit
> #
> # To print results from trace data:
> #
> # $ perf script -s /absolute/path/to/stealtime.py
> # 100466.1062486786 9690: steal time 00000.029318914 secs,
> #                         exit_reason IO_INSTRUCTION,
> #                         guest_rip 0xffffffff81278f02,
> #                         isa 1, info1 0xcf80003, info2 0x0
> #
> # The example above shows an I/O access to 0xcf8 (PCI Configuration Space
> # Address port) that took about 28 milliseconds.  The host pid was 9690; this
> # can be used to investigate the QEMU vcpu thread.  The guest rip can be used
> # to investigate guest code that triggered this vmexit.
> 
> # Print steal times longer than this threshold in milliseconds
> THRESHOLD_MS = 100
> 
> import os
> import sys
> 
> sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> 	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> 
> from perf_trace_context import *
> from Core import *
> 
> vcpu_threads = {}
> 
> def trace_begin():
>     print 'argv:', str(sys.argv)
> 
> def trace_end():
> 	pass
> 
> def kvm__kvm_exit(event_name, context, common_cpu,
> 	common_secs, common_nsecs, common_pid, common_comm,
> 	exit_reason, guest_rip, isa, info1, 
> 	info2):
> 
>     if common_pid in vcpu_threads:
>         last = vcpu_threads[common_pid]
>         assert last[0] == 'kvm__kvm_entry'
>         while last[2] > common_nsecs:
>             common_secs -= 1
>             common_nsecs += 1 * 1000 * 1000 * 1000
>         delta_secs = common_secs - last[1]
>         delta_nsecs = common_nsecs - last[2]
> 
>     vcpu_threads[common_pid] = (event_name, common_secs, common_nsecs, exit_reason, guest_rip, isa, info1, info2)
> 
> def kvm__kvm_entry(event_name, context, common_cpu,
> 	common_secs, common_nsecs, common_pid, common_comm,
> 	vcpu_id):
> 
>     if common_pid in vcpu_threads:
>         last = vcpu_threads[common_pid]
>         assert last[0] == 'kvm__kvm_exit'
>         while last[2] > common_nsecs:
>             common_secs -= 1
>             common_nsecs += 1 * 1000 * 1000 * 1000
>         delta_secs = common_secs - last[1]
>         delta_nsecs = common_nsecs - last[2]
>         if delta_secs > 0 or delta_nsecs > THRESHOLD_MS * 1000 * 1000:
>             print '%05u.%09u %u: steal time %05u.%09u secs, exit_reason %s, guest_rip %#x, isa %d, info1 %#x, info2 %#x' % (
>                     last[1], last[2],
>                     common_pid,
>                     delta_secs, delta_nsecs,
>                     symbol_str("kvm__kvm_exit", "exit_reason", last[3]),
>                     last[4],
>                     last[5],
>                     last[6],
>                     last[7])
> 
>     vcpu_threads[common_pid] = (event_name, common_secs, common_nsecs)
> 
> def trace_unhandled(event_name, context, event_fields_dict):
> 		print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())])
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux