Subject: [merged] mm-use-paravirt-friendly-ops-for-numa-hinting-ptes.patch removed from -mm tree To: mgorman@xxxxxxx,aarcange@xxxxxxxxxx,dave.hansen@xxxxxxxxx,david.vrabel@xxxxxxxxxx,fengguang.wu@xxxxxxxxx,gorcunov@xxxxxxxxx,hpa@xxxxxxxxx,mingo@xxxxxxxxxx,peterz@xxxxxxxxxxxxx,riel@xxxxxxxxxx,srikar@xxxxxxxxxxxxxxxxxx,stable@xxxxxxxxxxxxxxx,steven@xxxxxxxxxxxxxx,torvalds@xxxxxxxxxxxxxxxxxxxx,mm-commits@xxxxxxxxxxxxxxx From: akpm@xxxxxxxxxxxxxxxxxxxx Date: Mon, 21 Apr 2014 11:11:09 -0700 The patch titled Subject: mm: use paravirt friendly ops for NUMA hinting ptes has been removed from the -mm tree. Its filename was mm-use-paravirt-friendly-ops-for-numa-hinting-ptes.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Mel Gorman <mgorman@xxxxxxx> Subject: mm: use paravirt friendly ops for NUMA hinting ptes David Vrabel identified a regression when using automatic NUMA balancing under Xen whereby page table entries were getting corrupted due to the use of native PTE operations. Quoting him Xen PV guest page tables require that their entries use machine addresses if the preset bit (_PAGE_PRESENT) is set, and (for successful migration) non-present PTEs must use pseudo-physical addresses. This is because on migration MFNs in present PTEs are translated to PFNs (canonicalised) so they may be translated back to the new MFN in the destination domain (uncanonicalised). pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma() set and clear the _PAGE_PRESENT bit using pte_set_flags(), pte_clear_flags(), etc. In a Xen PV guest, these functions must translate MFNs to PFNs when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting _PAGE_PRESENT. His suggested fix converted p[te|md]_[set|clear]_flags to using paravirt-friendly ops but this is overkill. He suggested an alternative of using p[te|md]_modify in the NUMA page table operations but this is does more work than necessary and would require looking up a VMA for protections. This patch modifies the NUMA page table operations to use paravirt friendly operations to set/clear the flags of interest. Unfortunately this will take a performance hit when updating the PTEs on CONFIG_PARAVIRT but I do not see a way around it that does not break Xen. Signed-off-by: Mel Gorman <mgorman@xxxxxxx> Acked-by: David Vrabel <david.vrabel@xxxxxxxxxx> Tested-by: David Vrabel <david.vrabel@xxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Peter Anvin <hpa@xxxxxxxxx> Cc: Fengguang Wu <fengguang.wu@xxxxxxxxx> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Steven Noonan <steven@xxxxxxxxxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxx> Cc: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- include/asm-generic/pgtable.h | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff -puN include/asm-generic/pgtable.h~mm-use-paravirt-friendly-ops-for-numa-hinting-ptes include/asm-generic/pgtable.h --- a/include/asm-generic/pgtable.h~mm-use-paravirt-friendly-ops-for-numa-hinting-ptes +++ a/include/asm-generic/pgtable.h @@ -693,24 +693,35 @@ static inline int pmd_numa(pmd_t pmd) #ifndef pte_mknonnuma static inline pte_t pte_mknonnuma(pte_t pte) { - pte = pte_clear_flags(pte, _PAGE_NUMA); - return pte_set_flags(pte, _PAGE_PRESENT|_PAGE_ACCESSED); + pteval_t val = pte_val(pte); + + val &= ~_PAGE_NUMA; + val |= (_PAGE_PRESENT|_PAGE_ACCESSED); + return __pte(val); } #endif #ifndef pmd_mknonnuma static inline pmd_t pmd_mknonnuma(pmd_t pmd) { - pmd = pmd_clear_flags(pmd, _PAGE_NUMA); - return pmd_set_flags(pmd, _PAGE_PRESENT|_PAGE_ACCESSED); + pmdval_t val = pmd_val(pmd); + + val &= ~_PAGE_NUMA; + val |= (_PAGE_PRESENT|_PAGE_ACCESSED); + + return __pmd(val); } #endif #ifndef pte_mknuma static inline pte_t pte_mknuma(pte_t pte) { - pte = pte_set_flags(pte, _PAGE_NUMA); - return pte_clear_flags(pte, _PAGE_PRESENT); + pteval_t val = pte_val(pte); + + val &= ~_PAGE_PRESENT; + val |= _PAGE_NUMA; + + return __pte(val); } #endif @@ -729,8 +740,12 @@ static inline void ptep_set_numa(struct #ifndef pmd_mknuma static inline pmd_t pmd_mknuma(pmd_t pmd) { - pmd = pmd_set_flags(pmd, _PAGE_NUMA); - return pmd_clear_flags(pmd, _PAGE_PRESENT); + pmdval_t val = pmd_val(pmd); + + val &= ~_PAGE_PRESENT; + val |= _PAGE_NUMA; + + return __pmd(val); } #endif _ Patches currently in -mm which might be from mgorman@xxxxxxx are hugetlb-ensure-hugepage-access-is-denied-if-hugepages-are-not-supported.patch hugetlb-ensure-hugepage-access-is-denied-if-hugepages-are-not-supported-fix.patch x86-require-x86-64-for-automatic-numa-balancing.patch x86-define-_page_numa-by-reusing-software-bits-on-the-pmd-and-pte-levels.patch x86-define-_page_numa-by-reusing-software-bits-on-the-pmd-and-pte-levels-fix-2.patch mm-introduce-do_shared_fault-and-drop-do_fault-fix-fix.patch mm-compactionc-isolate_freepages_block-small-tuneup.patch mm-only-force-scan-in-reclaim-when-none-of-the-lrus-are-big-enough.patch mm-huge_memoryc-complete-conversion-to-pr_foo.patch mm-disable-zone_reclaim_mode-by-default.patch mm-page_alloc-do-not-cache-reclaim-distances.patch mm-page_alloc-prevent-migrate_reserve-pages-from-being-misplaced.patch mm-page_alloc-debug_vm-checks-for-free_list-placement-of-cma-and-reserve-pages.patch do_shared_fault-check-that-mmap_sem-is-held.patch linux-next.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html