On Sat, Sep 29, 2012 at 07:30:06AM -0700, Andi Kleen wrote: > On Sat, Sep 29, 2012 at 03:48:11PM +0200, Andrea Arcangeli wrote: > > On Sat, Sep 29, 2012 at 02:37:18AM +0300, Kirill A. Shutemov wrote: > > > Cons: > > > - increases TLB pressure; > > > > I generally don't like using 4k tlb entries ever. This only has the > > From theory I would also prefer the 2MB huge page. > > But some numbers comparing between the two alternatives are definitely > interesting. Numbers are often better than theory. Sure good idea, just all standard benchmarks likely aren't using zero pages so I suggest a basic micro benchmark: some loop of() { memcmp(uninitalized_pointer, (char *)uninitialized_pointer+4G, 4G) barrier(); } > > > There would be a small cache benefit here... but even then some first > > level caches are virtually indexed IIRC (always physically tagged to > > Modern x86 doesn't have virtually indexed caches. With the above memcmp, I'm quite sure the previous patch will beat the new one by a wide margin, especially on modern x86 with more 2M TLB entries and >= 8MB L2 caches. But I agree we need to verify it before taking a decision, and that the numbers are better than theory, or to rephrase it "let's check the theory is right" :) -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html