On 10/11/2014 13:15, Paolo Bonzini wrote: > > > On 10/11/2014 11:45, Gleb Natapov wrote: >>> I tried making also the other shared MSRs the same between guest and >>> host (STAR, LSTAR, CSTAR, SYSCALL_MASK), so that the user return notifier >>> has nothing to do. That saves about 4-500 cycles on inl_from_qemu. I >>> do want to dig out my old Core 2 and see how the new test fares, but it >>> really looks like your patch will be in 3.19. >> >> Please test on wide variety of HW before final decision. > > Yes, definitely. I've reproduced Andy's results on Ivy Bridge: NX off ~6900 cycles (EFER) NX on, SCE off ~14600 cycles (urn) NX on, SCE on ~6900 cycles (same value) I also asked Intel about clarifications. On Core 2 Duo the results are weird. There is no LOAD_EFER control, so Andy's patch does not apply and the only interesting paths are urn and same value. The pessimization of EFER writes does _seem_ to be there, since I can profile for iTLB flushes (r4082 on this microarchitecture) and get: 0.14% qemu-kvm [kernel.kallsyms] [k] native_write_msr_safe 0.14% qemu-kvm [kernel.kallsyms] [k] native_flush_tlb but these are the top two results and it is not clear to me why perf only records them as "0.14%"... Also, this machine has no EPT, so virt suffers a lot from TLB misses anyway. Nevertheless I tried running kvm-unit-tests with different values of the MSRs to see what's the behavior. NX=1/SCE=0 NX=1/SCE=1 all MSRs equal cpuid 3374 3448 3608 vmcall 3274 3337 3478 mov_from_cr8 11 11 11 mov_to_cr8 15 15 15 inl_from_pmtimer 17803 16346 15156 inl_from_qemu 17858 16375 15163 inl_from_kernel 6351 6492 6622 outl_to_kernel 3850 3900 4053 mov_dr 116 116 117 ple-round-robin 15 16 16 wr_tsc_adjust_msr 3334 3417 3570 rd_tsc_adjust_msr 3374 3404 3605 mmio-no-eventfd:pci-mem 19188 17866 16660 mmio-wildcard-eventfd:pci-mem 7319 7414 7595 mmio-datamatch-eventfd:pci-mem 7304 7470 7605 portio-no-eventfd:pci-io 13219 11780 10447 portio-wildcard-eventfd:pci-io 3951 4024 4149 portio-datamatch-eventfd:pci-io 3940 4026 4228 In the last column, all shared MSRs are equal (*) host and guest. The difference is very noisy on newer processors, but quite visible on the older processor. It is weird though that the light-weight exits become _more_ expensive as more MSRs are equal between guest and host. Anyhow, this is more of a curiosity since the proposed patch has no effect. Next will come Nehalem. Nehalem has both LOAD_EFER and EPT, so it's already a good target. I can test Westmere too, as soon as I find someone that has it, but it shouldn't give surprises. Paolo (*) run this: #! /usr/bin/env python class msr(object): def __init__(self): try: self.f = open('/dev/cpu/0/msr', 'r', 0) except: self.f = open('/dev/msr0', 'r', 0) def read(self, index, default = None): import struct self.f.seek(index) try: return struct.unpack('Q', self.f.read(8))[0] except: return default m = msr() for i in [0xc0000080, 0xc0000081, 0xc0000082, 0xc0000083, 0xc0000084]: print ("wrmsr(0x%x, 0x%x);" % (i, m.read(i))) and add the result to the enable_nx function. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html