On Fri, Nov 01, 2024 at 07:49:10AM -0700, Sean Christopherson wrote: > On Fri, Nov 01, 2024, Bernhard Kauer wrote: > > On Thu, Oct 31, 2024 at 04:29:52PM -0700, Sean Christopherson wrote: > > > With a userspace APIC, the roundtrip to userspace to emulate the EOI is measured > > > in tens of thousands of cycles. IIRC, last I played around with userspace exits > > > the average turnaround time was ~50k cycles. > > > > > > That sound a lot so I did some quick benchmarking. An exit is around 1400 > > TSC cycles on my AMD laptop, instruction emulation takes 1200 and going > > to user-level needs at least 6200. Not terribly slow but still room for > > optimizations. > > Ah, I suspect my recollection of ~50k cycles is from measuring all exits to > userspace, i.e. included the reaaaaly slow paths. I finally found the reason for the slow user-level roundtrip on my Zen3+ machine. Disabling SRSO with spec_rstack_overflow=off improves the user-level part by 3x. The exit as well as the instruction emulation overhead is down by 40%. Thus without SRSO a roundtrip to user-level needs roughly 2000 cycles. SRSO=off default factor INSTR CPUID 1008 1394 1.4x RDMSR 1072 1550 1.4x MMIO APIC 1666 2609 1.6x IOAPIC 1783 2800 1.6x HPET 3626 9426 2.6x PIO PIC 1250 1804 1.4x UART 2837 8011 2.8x