On Mon, Apr 05, 2010 at 05:26:15PM -0700, Linus Torvalds wrote: > > > On Tue, 6 Apr 2010, Andrea Arcangeli wrote: > > > > Some performance result: > > Quite frankly, these "performance results" seem to be basically dishonest. > > Judging by your numbers, the big win is apparently pre-populating the page > tables, the "tlb miss" you quote seem to be almost in the noise. IOW, we > have > > memset page fault 1566023 > > vs > > memset page fault 2182476 > > looking like a major performance advantage, but then the actual usage is > much less noticeable. > > IOW, how much of the performance advantage would we get from a _much_ > simpler patch to just much more aggressively pre-populate the page tables > (especially for just anonymous pages, I assume) or even just fault pages > in several at a time when you have lots of memory? I had a prefaulting patch that also allocated an hugepage but only mapped it with 2 ptes, 4 ptes, 8 ptes, up to 256ptes using a sysctl, until the memset faulted in the rest and that triggered another chunk of prefault on the reamining hugepage. In the end these weren't worth it so I went stright with huge pmd immediately (even if initially I worried about the more intensive clear-page in cow), which is hugely simpler too and doesn't only provide a page fault advantage. > In particular, when you quote 6% improvement for a kernel compile, your The memset test you mention above was run on host. The kernel compile is run on guest with an unmodified guest kernel. The kernel compile isn't mangling pagetables differently. The kernel compile is run on two different host kernels: one running with transparent hugepages one without, the guest kernel has no modifications at all. No page fault ever happens in the host, only gcc runs in the guest in an unmodified kernel that isn't using hugepages at all. > own numbers make seriously wonder how many percentage points you'd get > from just faulting in 8 pages at a time when you have lots of memory free, > and use a single 3-order allocation to get those eight pages? > > Would that already shrink the difference between those "memset page > faults" by a factor of eight? > > See what I'm saying? I see what you're saying but that has nothing to do with the 6% boost. In short I first measured the page fault improvement in host (~+50% faster, sure that has nothing to do with pmd_huge or the tlb miss, I said I mentioned it just for curiosity in fact), then measured the tlb miss improvement in host (a few percent faster as usual with hugetlbfs) then measured the boost in guest if host uses hugepages (with no guest kernel change at all, just the tlb miss going faster in guest and that boosts the guest kernel compile 6%) and then some other test with dd with all combinations of host/guest using hugepages or not, and also with dd run on bare metal with or without hugepages. As said gcc is a sort of worst case, so you can assume any guest math will run 6% faster or more in guest if the host runs with transparent hugepages enabled (and there's memory compaction etc). The page fault speedup is a "nice addon" that has nothing to do with the kernel compile improvement because it was repeated many times and the guest kernel memory was already faulted in before. I only wanted to point it out "for curiosity" as I wrote in the prev email. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>