Heya, So, Gleb suggested (thank you) to try kvm unit tests in L1, here's some (incomprehensive) detail. I also ran a defconfig in a 20 loop iteration (in L2) where subsequent runs after the first one are from page cache. 1. A couple of unit tests w/ 64 bit compiled: -------------------------------- $ ./configure $ time make [...] gcc -g -D__x86_64__ -I../include/x86 -m64 -O1 -MMD -MF x86/.init.d -g -fomit-frame-pointer -Wall -fno-stack-protector -I. -std=gnu99 -ffreestanding -I lib -I lib/x86 -c -o x86/init.o x86/init.c x86/init.c: In function ‘main’: x86/init.c:110:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ gcc -g -D__x86_64__ -I../include/x86 -m64 -O1 -MMD -MF x86/.ini real 0m14.358s user 0m6.990s sys 0m6.639s -------------------------------- = MSR test = -------------------------------- $ time qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio \ -nographic -no-user-config -nodefaults -device \ isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/msr.flat \ enabling apic MSR_IA32_APICBASE: PASS MSR_IA32_APICBASE: PASS IA32_SYSENTER_CS: PASS MSR_IA32_SYSENTER_ESP: PASS IA32_SYSENTER_EIP: PASS MSR_IA32_MISC_ENABLE: PASS MSR_IA32_CR_PAT: PASS MSR_FS_BASE: PASS MSR_GS_BASE: PASS MSR_KERNEL_GS_BASE: PASS MSR_EFER: PASS MSR_LSTAR: PASS MSR_CSTAR: PASS MSR_SYSCALL_MASK: PASS MSR_*STAR eager loading: PASS 15 tests, 0 failures real 0m0.525s user 0m0.147s sys 0m0.121s -------------------------------- = eventinj test = -------------------------------- $ time qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio \ -nographic -no-user-config -nodefaults -device \ isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/eventinj.flat enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 Try to divide by 0 DE isr running divider is 0 Result is 150 DE exception: PASS Try int 3 BP isr running After int 3 BP exception: PASS Try send vec 33 to itself irq1 running After vec 33 to itself vec 33: PASS Try int $33 irq1 running After int $33 int $33: PASS Try send vec 32 and 33 to itself irq1 running irq0 running After vec 32 and 33 to itself vec 32/33: PASS Try send vec 32 and int $33 irq1 running irq0 running After vec 32 and int $33 vec 32/int $33: PASS Try send vec 33 and 62 and mask one with TPR irq1 running After 33/62 TPR test TPR: PASS irq0 running Try send NMI to itself After NMI to itself NMI: FAIL Try int 33 with shadowed stack irq1 running After int 33 with shadowed stack int 33 with shadowed stack: PASS summary: 9 tests, 1 failures real 0m0.586s user 0m0.159s sys 0m0.163s -------------------------------- 2. A couple of kvm unit tests w/ 32-bit compiled: -------------------------------- $ yum install '*/stubs-32.h' [...] $ make ARCH=i386 clean all [...] gcc -g -D__i386__ -I /usr/src/kernels/3.10.0-0.rc1.git5.2.fc20.x86_64/include -I../include/x86 -m32 -O1 -MMD -MF x86/.eventinj.d -g -fomit-frame-pointer -Wall -fno-stack-protector -I. -nostdlib -o x86/eventin j.elf -Wl,-T,flat.lds x86/eventinj.o x86/cstart.o lib/libcflat.a /usr/lib/gcc/x86_64-redhat-linux/4.8.0/32/libgcc.a objcopy -O elf32-i386 x86/eventinj.elf x86/eventinj.flat [...] -------------------------------- = eventinj test = -------------------------------- $ time qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio \ -nographic \ -no-user-config -nodefaults -device \ isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel \ ./x86/eventinj.flat \ [...] After NMI to itself NMI: FAIL Try to divide by 0 PF running DE isr running divider is 0 Result is 150 DE PF exceptions: PASS Before NP test PF running NP isr running 400b5f err=18 irq1 running After int33 NP PF exceptions: PASS Try int 33 with shadowed stack irq1 running After int 33 with shadowed stack int 33 with shadowed stack: PASS summary: 14 tests, 1 failures real 0m0.589s user 0m0.188s sys 0m0.127s -------------------------------- = MSR test = -------------------------------- $ time qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio -nographic \ > -no-user-config -nodefaults -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel \ > ./x86/msr.flat enabling apic MSR_IA32_APICBASE: PASS MSR_IA32_APICBASE: PASS IA32_SYSENTER_CS: PASS MSR_IA32_SYSENTER_ESP: PASS IA32_SYSENTER_EIP: PASS MSR_IA32_MISC_ENABLE: PASS MSR_IA32_CR_PAT: PASS MSR_FS_BASE: PASS MSR_GS_BASE: PASS MSR_KERNEL_GS_BASE: PASS MSR_LSTAR: PASS MSR_CSTAR: PASS MSR_SYSCALL_MASK: PASS 13 tests, 0 failures real 0m0.499s user 0m0.136s sys 0m0.117s -------------------------------- 3. 20 iterations of kernel compile time: http://kashyapc.fedorapeople.org/virt/Haswell-VMCS-Shadowing-tests/kernel-build-times-20-iterations.txt Average "real" (elapsed) time - 9m44.495s. The result of of the above file is obtained after: $ for i in {1..20}; do make clean; \ time make -j 3; done 2>&1 | tee \ defconfig-make-output-20-iterations-l2.txt in L2 (nested guest). And, then running: $ cat defconfig-make-output-20-iterations-l2.txt \ grep -i bzImage -A | tee \ kernel-build-times-20-iterations.txt So, subsequent runs were loaded from page cache :) I understand that I need to run for much longer iterations and w/ more L2 guests to get a representative sample. If the above looks useful, and you have more suggestions, I'd be glad to test further. (I'm testing with Fedora rawhide, nodebug kernels.) PS: I just got some more memory on this Haswell machine (16G total), will run a few more L2 guests (and multiple L1s w/ nested guests in each). I'll try to run more workloads in L1, L2 (w/, w/o Shadow VMCS). -- /kashyap -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html