Re: [PATCH 5/5] [WIP] trace-cmd: Add new subcomand "trace-cmd perf"

Steven Rostedt <rostedt@xxxxxxxxxxx> · Fri, 19 Feb 2021 14:11:05 -0500

On Fri, 19 Feb 2021 19:56:59 +0200
Tzvetomir Stoyanov <tz.stoyanov@xxxxxxxxx> wrote:

> > Make sense?  
> Yes, but I have two concerns:
>  1. Recording kvm enter / exit events will make this implementation
> KVM specific,
>      how about other hypervisors ?

I think this should all be part of the protocol. The option can be
TRACECMD_OPTION_GUEST_METADATA, where that has a field for "enter/exit"
events with CPU, vCPU, timestamp, and the also a note of sched switch
events.

That is, the implementation will enable kvm events, but other trace-cmd
plugins can implement other events. But what gets recorded into the
trace.dat file can be post processed into a generic format.

>  2. The problem with finding guest event -> host CPU relation. We now only the
>       timestamp of the guest event, not yet synchronized with the host
> time. How will
>       find on what host CPU that guest event happened, as the host
> task CPU migration
>       mapping is recorded with host time ?

It would be done all post processed. Here's the workflow I envision:

 1. Run pre synchronization protocol between host/guest (ie. kvm, ptp)

 2. Start the meta data events.
	Start tracing guest enter/exit events along with scheduling.
	We may be able to even filter specifically if we know the threads a
	head of time, but that's just an optimization.
	These events will be recorded raw into a file (with host threads
	saving the events to disk like it does for the tracing data).
	All this is in a separate instance.

 3. Run tracing
	Recorded events on the guest are sent to the host and stored in raw
	format on the host.
	Optionally, events on the host is also recorded.

 4. At the end of tracing
	Stop the recording of tracing.
	Stop the recording of meta data.
	Run post synchronization protocol between host and guest.

 5. Read the meta data events and convert them into a form of timestamp,
    enter/exit events, migration events. Yes, the timestamp will be that of
    the host.

A couple of things to understand with step 5, and assumptions we can make.

The scale factor between the host and guest for each VCPU should not
change, even if the VCPU migrates. If it does, then there had to have been a
kernel event where the kernel knew about this change, and we need to find a
way to extract that. For now, we shouldn't worry about it, as I'm guessing
that could be rather complex to manage even on the kernel end, and it would
also be expensive for migrations.

I'm making a big assumption here, that either the different VCPUS have
different scaling factors but the host scaling between physical CPUS are
the same, or the physical CPUS have different scaling factors, but the VCPUS
do not. So far I ran this on 4 different machines (including my laptop) and
all had the same mult and shift on the host. Perhaps we just state that we
do not support machines that do not have this.

Anyway, when all is done, the meta data events will map to the host. Using
this mapping and the synchronization between the host and guest, we can map
these migrations and such to the guest itself. And if the vCPUS are
different (and we assume the host is the same for all), and we know when
VCPU scheduled away from a physical CPU (from the host time), we can then
figure out how to find the guest time when that happened.

Or did I miss something?

-- Steve