On Thu, Jun 01, 2023 at 07:07:38PM +0000, Steven Noonan wrote: > One issue is how much overhead it has. This is an instruction that > normally executes in roughly 50 clock cycles (RDTSC) to 100 clock > cycles (RDTSCP) on Zen 3. Based on a proof-of-concept I wrote, the > overhead of trapping and emulating with a signal handler is roughly > 100x. On my Zen 3 system, it goes up to around 10000 clock cycles per > trapped read of RDTSCP. What about kernel based emulation? You could tie it into user_dispatch and have a user_dispatch tsc offset. So regular kernel emulation simply returns the native value (keeps the VDSO working for one), but then from a user_dispatch range, it returns +offset. That is; how slow is the below? diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 58b1f208eff5..18175b45db1f 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -645,6 +645,25 @@ static bool fixup_iopl_exception(struct pt_regs *regs) return true; } +static bool fixup_rdtsc_exception(struct pt_regs *regs) +{ + unsigned short bytes; + u32 eax, edx; + + if (get_user(bytes, (const short __user *)ip)) + return false; + + if (bytes != 0x0f31) + return false; + + asm volatile ("rdtsc", "=a" (eax), "=d" (edx)); + regs->ax = eax; + regs->dx = edx; + + regs->ip += 2; + return true; +} + /* * The unprivileged ENQCMD instruction generates #GPs if the * IA32_PASID MSR has not been populated. If possible, populate @@ -752,6 +771,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) if (fixup_iopl_exception(regs)) goto exit; + if (fixup_rdtsc_exception(regs)) + goto exit; + if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0)) goto exit;