Factor out one small part of the shmem pmd handling: the inline function vma_adjust_trans_huge() (called when vmas are split or merged) contains a preliminary !anon_vma || vm_ops check to avoid the overhead of __vma_adjust_trans_huge() on areas which could not possibly contain an anonymous THP pmd. But with huge tmpfs, we shall need it to be called even in those excluded cases. Before the split pmd ptlocks, there was a nice alternative optimization to make: avoid the overhead of __vma_adjust_trans_huge() on mms which could not possibly contain a huge pmd - those with NULL pmd_huge_pte (using a huge pmd demands the deposit of a spare page table, typically stored in a list at pmd_huge_pte, withdrawn for use when splitting the pmd; and huge tmpfs will follow that protocol too). Still use that optimization when !USE_SPLIT_PMD_PTLOCKS, when mm->pmd_huge_pte is updated under mm->page_table_lock (but beware: unlike other arches, powerpc made no use of pmd_huge_pte before, so this patch hacks it to update pmd_huge_pte as a count). In common configs, no equivalent optimization on x86 now: if that's a visible problem, we can add an atomic count or flag to mm for the purpose. And looking into the overhead of __vma_adjust_trans_huge(): it is silly for split_huge_page_pmd_mm() to be calling find_vma() followed by split_huge_page_pmd(), when it can check the pmd directly first, and usually avoid the find_vma() call. Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> --- arch/powerpc/mm/pgtable_64.c | 7 ++++++- include/linux/huge_mm.h | 5 ++++- mm/huge_memory.c | 7 ++----- 3 files changed, 12 insertions(+), 7 deletions(-) --- thpfs.orig/arch/powerpc/mm/pgtable_64.c 2015-02-08 18:54:22.000000000 -0800 +++ thpfs/arch/powerpc/mm/pgtable_64.c 2015-02-20 19:34:32.363944978 -0800 @@ -675,9 +675,12 @@ void pgtable_trans_huge_deposit(struct m pgtable_t pgtable) { pgtable_t *pgtable_slot; + assert_spin_locked(&mm->page_table_lock); + mm->pmd_huge_pte++; /* - * we store the pgtable in the second half of PMD + * we store the pgtable in the second half of PMD; but must also + * set pmd_huge_pte for the optimization in vma_adjust_trans_huge(). */ pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD; *pgtable_slot = pgtable; @@ -696,6 +699,8 @@ pgtable_t pgtable_trans_huge_withdraw(st pgtable_t *pgtable_slot; assert_spin_locked(&mm->page_table_lock); + mm->pmd_huge_pte--; + pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD; pgtable = *pgtable_slot; /* --- thpfs.orig/include/linux/huge_mm.h 2014-12-07 14:21:05.000000000 -0800 +++ thpfs/include/linux/huge_mm.h 2015-02-20 19:34:32.363944978 -0800 @@ -143,8 +143,11 @@ static inline void vma_adjust_trans_huge unsigned long end, long adjust_next) { - if (!vma->anon_vma || vma->vm_ops) +#if !USE_SPLIT_PMD_PTLOCKS + /* If no pgtable is deposited, there is no huge pmd to worry about */ + if (!vma->vm_mm->pmd_huge_pte) return; +#endif __vma_adjust_trans_huge(vma, start, end, adjust_next); } static inline int hpage_nr_pages(struct page *page) --- thpfs.orig/mm/huge_memory.c 2015-02-20 19:33:51.492038431 -0800 +++ thpfs/mm/huge_memory.c 2015-02-20 19:34:32.367944969 -0800 @@ -2905,11 +2905,8 @@ again: void split_huge_page_pmd_mm(struct mm_struct *mm, unsigned long address, pmd_t *pmd) { - struct vm_area_struct *vma; - - vma = find_vma(mm, address); - BUG_ON(vma == NULL); - split_huge_page_pmd(vma, address, pmd); + if (unlikely(pmd_trans_huge(*pmd))) + __split_huge_page_pmd(find_vma(mm, address), address, pmd); } static void split_huge_page_address(struct mm_struct *mm, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>