On Wed, Mar 31, 2010 at 11:24:02AM -0500, Christoph Lameter wrote: > On Wed, 31 Mar 2010, Andrea Arcangeli wrote: > > > > I'm sorry if you answered someone already. > > > > The generic archs without pmd approach can't mix hugepages and regular > > pages in the same vma, so they can't provide graceful fallback and > > never fail an allocation despite there is pleny of memory free which > > is one critical fundamental point in the design (and later collapse > > those with khugepaged which also can run memory compaction > > asynchronously in the background and not synchronously during page > > fault which would be entirely worthless for short lived allocations). > > Large pages would be more independent from the page table structure with > the approach that I outlined earlier since you would not have to do these > sync tricks. I was talking about memory compaction. collapse_huge_page will still be needed forever regardless of split_huge_page existing or not. > > About the HPAGE_PMD_ prefix it's not only HPAGE_ like I did initially, > > in case we later decide to split/collapse 1G pages too but frankly I > > think by the time memory size doubles 512 times across the board (to > > make 1G pages a not totally wasted effort to implement in the > > transparent hugepage support) we'd better move the PAGE_SIZE to 2M and > > stick to the HPAGE_PMD_ again. > > There are applications that have benefited for years already from 1G page > sizes (available on IA64 f.e.). So why wait? Because the difficulty on finding hugepages free increases exponentially with the order of allocation. Plus increasing MAX_ORDER so much would slowdown everything for no gain because we will fail to obtain 1G pages freed. The cost of compacting 1G pages also is 512 times bigger than with regular pages. It's not feasible right now with current memory sizes, I just said it's probably better to move to PAGE_SIZE 2M instead of extending to 1g pages in a kernel whose PAGE_SIZE is 4k. Last but not the least it can be done but considering I'm abruptly failing to merge 35 patches (and surely your comments aren't helping in that direction...), it'd be counter-productive to make the core even more complex with support for 1G pages immediately. In any case the 1G support should be done at the very end of the patchset, not in the core, or merging would be even harder as it'll all become more complex all over the place requiring to modify two places instead of just 1 all over the VM for every pagetable walk, and split_huge_page internals would become more complex too. Doing it incremental also allows the 1G support to be bisectable later. In short, I think it makes zero sense to do it now, I think it makes no sense until memory sizes increases 512 times, but in any case I agreed to call it HPAGE_PMD_ and not HPAGE_ for a reason, so discussing it now or mentioning lack of immediate monolithic-no-bisectable 1G support isn't good reason for going against my current patchset and we can defer this unpractical 1G support after the useful 2M support is merged. In fact I think the preferred way to do it (if we ever add it) is to make 2M handling native first and then convert split_huge_page to be the "compatibility fallback code" from 1G to 2M. Otherwise at times split_huge_page would be forced to run a 262144 loop which might become noticeable. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>