Khoa and I have been discussing a workload that triggers softlockups and hung task warnings inside the guest. These warnings can pop up due to bugs in the guest Linux kernel but they can also be triggered by the hypervisor if vcpus are not being scheduled at reasonable times. I've wanted a tool that reports high stealtimes and includes the last vmexit reason. This allows us to figure out if specific I/O device emulation is taking too long or if other factors like host memory pressure are degrading guest performance. Here is a first sketch of such a tool. It's a perf-script(1) Python script which can be used to analyze perf.data files recorded with kvm:kvm_entry and kvm:kvm_exit events. Stealtimes exceeding a threshold will be flagged up: $ perf script -s /absolute/path/to/stealtime.py 100466.1062486786 9690: steal time 00000.029318914 secs, exit_reason IO_INSTRUCTION, guest_rip 0xffffffff81278f02, isa 1, info1 0xcf80003, info2 0x0 The example above shows an I/O access to 0xcf8 (PCI Configuration Space Address port) that took about 28 milliseconds. The host pid was 9690; this can be used to investigate the QEMU vcpu thread. The guest rip can be used to investigate guest code that triggered this vmexit. Given this information, it becomes possible to debug QEMU to figure out why vmexit handling is taking too long. It might be due to global mutex contention if another thread holds the global mutex while blocking. This sort of investigation needs to be done manually today but it might be possible to add perf event handlers to watch for global mutex contention inside QEMU and automatically identify the culprit. Stalls inside the kvm kernel module can also be investigated since kvm:kvm_exit events are triggered when they happen too. I wanted to share in case it is useful for others. Suggestions for better approaches welcome! Signed-off-by: Stefan Hajnoczi <stefanha@xxxxxxxxxx> --- #!/usr/bin/env python # perf script event handlers, generated by perf script -g python # Licensed under the terms of the GNU GPL License version 2 # Script to print steal times longer than a given threshold # # To collect trace data: # $ perf record -a -e kvm:kvm_entry -e kvm:kvm_exit # # To print results from trace data: # # $ perf script -s /absolute/path/to/stealtime.py # 100466.1062486786 9690: steal time 00000.029318914 secs, # exit_reason IO_INSTRUCTION, # guest_rip 0xffffffff81278f02, # isa 1, info1 0xcf80003, info2 0x0 # # The example above shows an I/O access to 0xcf8 (PCI Configuration Space # Address port) that took about 28 milliseconds. The host pid was 9690; this # can be used to investigate the QEMU vcpu thread. The guest rip can be used # to investigate guest code that triggered this vmexit. # Print steal times longer than this threshold in milliseconds THRESHOLD_MS = 100 import os import sys sys.path.append(os.environ['PERF_EXEC_PATH'] + \ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') from perf_trace_context import * from Core import * vcpu_threads = {} def trace_begin(): print 'argv:', str(sys.argv) def trace_end(): pass def kvm__kvm_exit(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, exit_reason, guest_rip, isa, info1, info2): if common_pid in vcpu_threads: last = vcpu_threads[common_pid] assert last[0] == 'kvm__kvm_entry' while last[2] > common_nsecs: common_secs -= 1 common_nsecs += 1 * 1000 * 1000 * 1000 delta_secs = common_secs - last[1] delta_nsecs = common_nsecs - last[2] vcpu_threads[common_pid] = (event_name, common_secs, common_nsecs, exit_reason, guest_rip, isa, info1, info2) def kvm__kvm_entry(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, vcpu_id): if common_pid in vcpu_threads: last = vcpu_threads[common_pid] assert last[0] == 'kvm__kvm_exit' while last[2] > common_nsecs: common_secs -= 1 common_nsecs += 1 * 1000 * 1000 * 1000 delta_secs = common_secs - last[1] delta_nsecs = common_nsecs - last[2] if delta_secs > 0 or delta_nsecs > THRESHOLD_MS * 1000 * 1000: print '%05u.%09u %u: steal time %05u.%09u secs, exit_reason %s, guest_rip %#x, isa %d, info1 %#x, info2 %#x' % ( last[1], last[2], common_pid, delta_secs, delta_nsecs, symbol_str("kvm__kvm_exit", "exit_reason", last[3]), last[4], last[5], last[6], last[7]) vcpu_threads[common_pid] = (event_name, common_secs, common_nsecs) def trace_unhandled(event_name, context, event_fields_dict): print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())]) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html