Very recently, trace-cmd got a few new features that allow you to merge the host and guest kernel traces using the host TSC. Those features originated in the tracing we're doing to debug spikes in real-time KVM. However, as real-time KVM uses a very specific setup and as we have so far debugged a very simple application, I'm wondering: is this feature useful for the general, non-realtime, use-cases? If the answer is yes, then I've got several ideas on how to make host and guest trace merging extremely simple to use. I'll first describe how we do tracing for real-time KVM. Then I'll give some suggestions on how to use the same procedure for unpinned use-cases. Lastly, I'll talk about how we could make it easy to use. Real-time KVM host and guest tracing ==================================== In real-time KVM, each guest's vCPU is pinned to a different host core. The real-time application running in the guest is also pinned. When we get a spike, we know in which guest CPU it ocurred, and so we know in which host core this CPU was running. All we have to do is to get a trace of that guest CPU/host core pair. 1. Setup -------- You'll need the following: 1. A stable TSC 2. The TSC offset of the guest you want to debug (see below) 3. Have your guest transfer a file to your host someway (I use networking) 4. Latest trace-cmd.git in both host and guest (HEAD c21aae2c or later) To get the TSC offset of the guest, you can use the kvm_write_tsc_offset tracepoint in the host. I use this script to do it: http://people.redhat.com/~lcapitul/real-time/start-debug-guest Yes, it sucks. I have an idea on how to improve this (keep reading). 2. Tracing ---------- In summary, what you have to do is: 1. run trace-cmd start -C x86-tsc in the host 2. run trace-cmd record -C x86-tsc in the guest 3. run trace-cmd stop in the host 4. run trace-cmd extract in the host 4. copy the guest's trace.dat file to a known location in the host This guest script does all that: http://people.redhat.com/~lcapitul/real-time/trace-host-and-guest I run it like this: # trace-host-and-guest cyclictest -q -n -b10 --notrace 3. Merging ---------- Merging is simple: $ trace-cmd report -i host-trace.dat --ts-offset $((GUEST-TSC-offset)) \ -i guest-trace.dat For real-time KVM, we also want to see the difference in nanoseconds of each line in the trace so we use additional options: $ trace-cmd report --ts-diff --ts2secs HOST-Hz -t \ -i host-trace.dat --ts-offset $((GUEST-TSC-offset)) \ -i guest-trace.dat Here's a real example: $ trace-cmd report --ts-diff --ts2secs 26000000000 -t \ -i host-trace.dat --ts-offset $((18443676333795429734)) \ -i guest-trace.dat And here's a little extract of a merged trace where the host injects a timer interrupt, the guest handles it and reprograms the next hrtimer timer. The value in "()" is how many nanoseconds it took between the previous and the following line: host-trace.dat: qemu-kvm-3699 [004] [004] 6196.749398857: (+88) function: kvm_inject_pending_timer_irqs <-- kvm_arch_vcpu_ioctl_run host-trace.dat: qemu-kvm-3699 [004] [004] 6196.749398990: (+133) kvm_entry: vcpu 0 guest-trace.dat: <idle>-0 [000] [000] 6196.749399096: (+106) function: hrtimer_interrupt <-- local_apic_timer_interrupt guest-trace.dat: <idle>-0 [000] [000] 6196.749399123: (+27) function: hrtimer_wakeup <-- __run_hrtimer guest-trace.dat: <idle>-0 [000] [000] 6196.749399183: (+60) function: tick_program_event <-- hrtimer_interrupt host-trace.dat: qemu-kvm-3699 [004] [004] 6196.749399219: (+36) kvm_exit: reason MSR_WRITE rip 0xffffffff8104bf58 info 0 0 host-trace.dat: qemu-kvm-3699 [004] [004] 6196.749399260: (+41) function: kvm_set_lapic_tscdeadline_msr <-- kvm_set_msr_common host-trace.dat: qemu-kvm-3699 [004] [004] 6196.749399283: (+23) function: hrtimer_start <-- start_apic_timer host-trace.dat: qemu-kvm-3699 [004] [004] 6196.749399336: (+53) kvm_entry: vcpu 0 Unpinned use-cases ================== If you can't pin the guest vCPU threads and the guest application like we do in real-time KVM, you could try the following: * If your guest has a single CPU, or you want to trace a specific guest vCPU then try to pass -P vCPU-TID when running "trace-cmd record start" in the host * If you want to trace multiple vCPUs, I think you could try to trace all cores where the vCPUs could run with -M. Then you could try to merge this with the guest trace and see if you get a single timeline of all cores and guests CPUs trace-cmd-server ================ Everything I described could look like this: # trace-cmd server [ in the host ] # trace-cmd record [ in the guest ] # trace-cmd report [ in the host, to merge the traces ] To achieve this, we need two things: 1. Add an interface to obtain the guest TSC offset from the host user-space. Maybe we could have a new sysfs directory, with a file per vCPU thread and the offset as contents? Or maybe just add a new entry to /proc/, like: /proc/TID/tsc-offset? 2. Build a trace-cmd-server Which does the following: 1. Receive trace-cmd record commands from a guest, to be performed in the host 2. Read the TSC offset for vCPUs being traced 3. Read the CPU frequency for --ts2secs 4. Receive the guest.dat from the guest and store it together with the host's I'd suggest the default communication between guest and host be via networking. Then we add vsock support when it's available. Credits ======= Real-time KVM tracing was originally developped by Marcelo Tosatti using scripts. Steven Rostedt added the features we needed in trace-cmd at my request. All the ideas and scripts showed here are mine. I also take the the responsibility for any mistakes or nonsensical stuff. -- To unsubscribe from this list: send the line "unsubscribe linux-trace-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html