On 31/03/19 17:12, Borislav Petkov wrote: > On Sun, Mar 31, 2019 at 04:20:11PM +0200, Paolo Bonzini wrote: >> These are not slow path. > > Those functions do a *lot* of stuff like a bunch of MSR reads which are > tens of cycles each at least. The MSR reads and writes are not done in the common case. Also, you cannot really expect boot_cpu_data to be in L1 in these functions since they run after the guest---or if they do, each L1 line you fill in with host data is one line you "steal" from the guest. Paolo > I don't think a RIP-relative MOV and a BT: > > movq boot_cpu_data+20(%rip), %rax # MEM[(const long unsigned int *)&boot_cpu_data + 20B], _45 > btq $59, %rax #, _45 > > are at all noticeable. > > On latest AMD and Intel uarch those are 2-4 cycles, according to > > https://agner.org/optimize/instruction_tables.ods >