On Fri, Apr 16, 2010 at 11:13:10AM -0500, Christoph Lameter wrote: > On Thu, 15 Apr 2010, Andrea Arcangeli wrote: > > > 2) add alloc_pages_vma for numa awareness in the huge page faults > > How do interleave policies work with alloc_pages_vma? So far the semantics > is to spread 4k pages over different nodes. With 2M pages this can no > longer work the way is was. static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, unsigned nid) See the order parameter, so I hope it's already solved. I assume the idea would be to interleave 2M pages to avoid the CPU the memory overhead of the pte layer and to decrease the tlb misses, but still maxing out the bandwidth of the system when multiple threads accesses memory that is stored in different nodes with random access. It should be ideal for hugetlbfs too for the large shared memory pools of the DB. Surely it'll be better than having all hugepages from the same node despite MPOL_INTERLEAVE is set. Said that, it'd also be possible to disable hugepages if the vma has MPOL_INTERLEAVE set, but I doubt we want to do that by default. Maybe we can add a sysfs control later for that which can be further tweaked at boot time by per-arch quirks, dunno... It's really up to you, you know numa better, but I've no doubt that MPOL_INTERLEAVE also can make sense with hugepages (both hugetlbfs and transparent hugepage support). Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>