Ok, it seems to be related to the guest/host page sizes. It seems that if you run 2MB pages in a VM on top of 4KB pages in the host, any invlpg in the VM causes all 2MB guest pages to be flushed. I’ll try to find time to make sure there is nothing else to it. Thanks for the assistance, and let me know if you need my hacky tests. The measurements below are of VM and bare-metal: Host Guest Full Flush Selective Flush PGsize PGsize (dTLB misses) (dTLB misses) ----------------------------------------------- VM 4KB 4KB 103,008,052 93,172 4KB 2MB 102,022,557 102,038,021 2MB 4KB 103,005,083 2,888 2MB 2MB 4,002,969 2,556 HOST 4KB 50,000,572 789 2MB 1,000,454 537 Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > Argh... I don’t get the same behavior in the guest with the module test. > I’ll need some more time to figure it out. > > Just a small comment regarding your “global” test: you forgot to set > CR4.PGE. > > Once I set it, I get reasonable numbers (excluding the invlpg flavor). > > with invlpg: 964431529 > with full flush: 268190767 > invlpg only 126114041 > full flushes only 185971818 > access net 111229828 > w/full flush net 82218949 —> similar to access net > w/invlpg net 838317488 > > I’ll be back when I have more understanding of the situation. > > Thanks, > Nadav > > > Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > >> On 16/05/2016 18:51, Nadav Amit wrote: >>> Thanks! I appreciate it. >>> >>> I think your experiment with global paging just corraborate that the >>> latency is caused by TLB misses. I measured TLB misses (and especially STLB >>> misses) in other experiments but not in this one. I will run some more >>> experiments, specifically to test how AMD behaves. >> >> I'm curious about AMD too now... >> >> with invlpg: 285,639,427 >> with full flush: 584,419,299 >> invlpg only 70,681,128 >> full flushes only 265,238,766 >> access net 242,538,804 >> w/full flush net 319,180,533 >> w/invlpg net 214,958,299 >> >> Roughly the same with and without pte.g. So AMD behaves as it should. >> >>> I should note this is a byproduct of a study I did, and it is not as if I was >>> looking for strange behaviors (no more validation papers for me!). >>> >>> The strangest thing is that on bare-metal I don’t see this phenomenon - I doubt >>> it is a CPU “feature”. Once we understand it, the very least it may affect >>> the recommended value of “tlb_single_page_flush_ceiling”, that controls when >>> the kernel performs full TLB flush vs. selective flushes. >> >> Do you have a kernel module to reproduce the test on bare metal? (/me is >> lazy). >> >> Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html