On 04/03/2022 14:30, Peter Zijlstra wrote: > On Mon, Feb 14, 2022 at 01:09:05PM +0200, Adrian Hunter wrote: >> Currently, using Intel PT to trace a VM guest is limited to kernel space >> because decoding requires side band events such as MMAP and CONTEXT_SWITCH. >> While these events can be collected for the host, there is not a way to do >> that yet for a guest. One approach, would be to collect them inside the >> guest, but that would require being able to synchronize with host >> timestamps. >> >> The motivation for this patch is to provide a clock that can be used within >> a VM guest, and that correlates to a VM host clock. In the case of TSC, if >> the hypervisor leaves rdtsc alone, the TSC value will be subject only to >> the VMCS TSC Offset and Scaling. Adjusting for that would make it possible >> to inject events from a guest perf.data file, into a host perf.data file. >> >> Thus making possible the collection of VM guest side band for Intel PT >> decoding. >> >> There are other potential benefits of TSC as a perf event clock: >> - ability to work directly with TSC >> - ability to inject non-Intel-PT-related events from a guest >> >> Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> >> --- >> arch/x86/events/core.c | 16 +++++++++ >> arch/x86/include/asm/perf_event.h | 3 ++ >> include/uapi/linux/perf_event.h | 12 ++++++- >> kernel/events/core.c | 57 +++++++++++++++++++------------ >> 4 files changed, 65 insertions(+), 23 deletions(-) >> >> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c >> index e686c5e0537b..51d5345de30a 100644 >> --- a/arch/x86/events/core.c >> +++ b/arch/x86/events/core.c >> @@ -2728,6 +2728,17 @@ void arch_perf_update_userpage(struct perf_event *event, >> !!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT); >> userpg->pmc_width = x86_pmu.cntval_bits; >> >> + if (event->attr.use_clockid && >> + event->attr.ns_clockid && >> + event->attr.clockid == CLOCK_PERF_HW_CLOCK) { >> + userpg->cap_user_time_zero = 1; >> + userpg->time_mult = 1; >> + userpg->time_shift = 0; >> + userpg->time_offset = 0; >> + userpg->time_zero = 0; >> + return; >> + } >> + >> if (!using_native_sched_clock() || !sched_clock_stable()) >> return; > > This looks the wrong way around. If TSC is found unstable, we should > never expose it. Intel PT traces contain TSC whether or not it is stable, and it could still be usable in some cases e.g. short traces on a single CPU. Ftrace seems to offer x86-tsc unconditionally as a clock. We could add warnings to comments and documentation about its potential pitfalls. > > And I'm not at all sure about the whole virt thing. Last time I looked > at pvclock it made no sense at all. It is certainly not useful for synchronizing events against TSC.