[RFC] host and guest kernel trace merging

Luiz Capitulino <lcapitulino@xxxxxxxxxx> · Thu, 3 Mar 2016 14:35:01 -0500

Very recently, trace-cmd got a few new features that allow you
to merge the host and guest kernel traces using the host TSC.

Those features originated in the tracing we're doing to debug spikes
in real-time KVM. However, as real-time KVM uses a very specific
setup and as we have so far debugged a very simple application,
I'm wondering: is this feature useful for the general, non-realtime,
use-cases?

If the answer is yes, then I've got several ideas on how to
make host and guest trace merging extremely simple to use.

I'll first describe how we do tracing for real-time KVM. Then
I'll give some suggestions on how to use the same procedure
for unpinned use-cases. Lastly, I'll talk about how we could
make it easy to use.

Real-time KVM host and guest tracing
====================================

In real-time KVM, each guest's vCPU is pinned to a different host
core. The real-time application running in the guest is also pinned.
When we get a spike, we know in which guest CPU it ocurred, and
so we know in which host core this CPU was running. All we have to
do is to get a trace of that guest CPU/host core pair.

1. Setup
--------

You'll need the following:

 1. A stable TSC
 2. The TSC offset of the guest you want to debug
    (see below)
 3. Have your guest transfer a file to your
    host someway (I use networking)
 4. Latest trace-cmd.git in both host and guest
    (HEAD c21aae2c or later)

To get the TSC offset of the guest, you can use the kvm_write_tsc_offset
tracepoint in the host. I use this script to do it:

 http://people.redhat.com/~lcapitul/real-time/start-debug-guest

Yes, it sucks. I have an idea on how to improve this (keep reading).

2. Tracing
----------

In summary, what you have to do is:

 1. run trace-cmd start -C x86-tsc in the host
 2. run trace-cmd record -C x86-tsc in the guest
 3. run trace-cmd stop in the host
 4. run trace-cmd extract in the host
 4. copy the guest's trace.dat file to a known
    location in the host

This guest script does all that:

 http://people.redhat.com/~lcapitul/real-time/trace-host-and-guest

I run it like this:

 # trace-host-and-guest cyclictest -q -n -b10 --notrace

3. Merging
----------

Merging is simple:

 $ trace-cmd report -i host-trace.dat --ts-offset $((GUEST-TSC-offset)) \
                    -i guest-trace.dat

For real-time KVM, we also want to see the difference in nanoseconds
of each line in the trace so we use additional options:

 $ trace-cmd report --ts-diff --ts2secs HOST-Hz -t \
                    -i host-trace.dat --ts-offset $((GUEST-TSC-offset)) \
					-i guest-trace.dat

Here's a real example:

 $ trace-cmd report --ts-diff --ts2secs 26000000000 -t \
                    -i host-trace.dat --ts-offset $((18443676333795429734)) \
					-i guest-trace.dat

And here's a little extract of a merged trace where the host injects
a timer interrupt, the guest handles it and reprograms the next
hrtimer timer. The value in "()" is how many nanoseconds it took
between the previous and the following line:

 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749398857: (+88)    function:             kvm_inject_pending_timer_irqs <-- kvm_arch_vcpu_ioctl_run
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749398990: (+133)   kvm_entry:            vcpu 0
guest-trace.dat:           <idle>-0     [000] [000]  6196.749399096: (+106)   function:             hrtimer_interrupt <-- local_apic_timer_interrupt
guest-trace.dat:           <idle>-0     [000] [000]  6196.749399123: (+27)    function:             hrtimer_wakeup <-- __run_hrtimer
guest-trace.dat:           <idle>-0     [000] [000]  6196.749399183: (+60)    function:             tick_program_event <-- hrtimer_interrupt
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749399219: (+36)    kvm_exit:             reason MSR_WRITE rip 0xffffffff8104bf58 info 0 0
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749399260: (+41)    function:             kvm_set_lapic_tscdeadline_msr <-- kvm_set_msr_common
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749399283: (+23)    function:             hrtimer_start <-- start_apic_timer
 host-trace.dat:         qemu-kvm-3699  [004] [004]  6196.749399336: (+53)    kvm_entry:            vcpu 0

Unpinned use-cases
==================

If you can't pin the guest vCPU threads and the guest application like
we do in real-time KVM, you could try the following:

 * If your guest has a single CPU, or you want to trace a
   specific guest vCPU then try to pass -P vCPU-TID when
   running "trace-cmd record start" in the host

 * If you want to trace multiple vCPUs, I think you could
   try to trace all cores where the vCPUs could run with -M.
   Then you could try to merge this with the guest trace and
   see if you get a single timeline of all cores and guests CPUs

trace-cmd-server
================

Everything I described could look like this:

  # trace-cmd server [ in the host ]
  # trace-cmd record [ in the guest ]
  # trace-cmd report [ in the host, to merge the traces ]

To achieve this, we need two things:

 1. Add an interface to obtain the guest TSC offset from the
    host user-space.

    Maybe we could have a new sysfs directory, with a file
    per vCPU thread and the offset as contents? Or maybe
    just add a new entry to /proc/, like: /proc/TID/tsc-offset?

 2. Build a trace-cmd-server

    Which does the following:

      1. Receive trace-cmd record commands from a guest,
         to be performed in the host

      2. Read the TSC offset for vCPUs being traced

      3. Read the CPU frequency for --ts2secs

      4. Receive the guest.dat from the guest and store it together
         with the host's

    I'd suggest the default communication between guest and
    host be via networking. Then we add vsock support when it's
    available.

Credits
=======

Real-time KVM tracing was originally developped by Marcelo Tosatti
using scripts. Steven Rostedt added the features we needed in trace-cmd
at my request.

All the ideas and scripts showed here are mine. I also take the
the responsibility for any mistakes or nonsensical stuff.
--
To unsubscribe from this list: send the line "unsubscribe linux-trace-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html