[PATCH RFC] x86,mm: use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In b38af4721 ("x86,mm: fix pte_special versus pte_numa") pte_special()
(SPECIAL with PRESENT or PROTNONE) was made to complement pte_numa()
(SPECIAL with neither PRESENT nor PROTNONE). That broke Xen PV guest
with NUMA balancing support.

That's because Xen hypervisor sets _PAGE_GLOBAL (_PAGE_GLOBAL /
_PAGE_PROTNONE in Linux) for guest user space mapping. So in a Xen PV
guest, when NUMA balancing is enabled, a NUMA hinted PTE ends up
"SPECIAL (in fact NUMA) with PROTNONE but not PRESENT", which makes
pte_special() returns true when it shouldn't.

Fundamentally we only need _PAGE_NUMA and _PAGE_PRESENT to tell
difference between an unmapped entry and an entry protected for NUMA
hinting fault. So use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA, adjust
_PAGE_NUMA_MASK and SWP_OFFSET_SHIFT as needed.

Suggested-by: David Vrabel <david.vrabel@xxxxxxxxxx>
Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: linux-mm@xxxxxxxxx
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
---
 arch/x86/include/asm/pgtable.h       |    5 -----
 arch/x86/include/asm/pgtable_64.h    |    2 +-
 arch/x86/include/asm/pgtable_types.h |    8 ++++----
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index aa97a07..8dee3ed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -131,11 +131,6 @@ static inline int pte_exec(pte_t pte)
 
 static inline int pte_special(pte_t pte)
 {
-	/*
-	 * See CONFIG_NUMA_BALANCING pte_numa in include/asm-generic/pgtable.h.
-	 * On x86 we have _PAGE_BIT_NUMA == _PAGE_BIT_GLOBAL+1 ==
-	 * __PAGE_BIT_SOFTW1 == _PAGE_BIT_SPECIAL.
-	 */
 	return (pte_flags(pte) & _PAGE_SPECIAL) &&
 		(pte_flags(pte) & (_PAGE_PRESENT|_PAGE_PROTNONE));
 }
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 4572b2f..26f2ade 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -148,7 +148,7 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
 #ifdef CONFIG_NUMA_BALANCING
 /* Automatic NUMA balancing needs to be distinguishable from swap entries */
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 2)
+#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 3)
 #else
 #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
 #endif
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 0778964..bc82d6b 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -31,9 +31,9 @@
  * Swap offsets on configurations that allow automatic NUMA balancing use the
  * bits after _PAGE_BIT_GLOBAL. To uniquely distinguish NUMA hinting PTEs from
  * swap entries, we use the first bit after _PAGE_BIT_GLOBAL and shrink the
- * maximum possible swap space from 16TB to 8TB.
+ * maximum possible swap space from 16TB to 4TB.
  */
-#define _PAGE_BIT_NUMA		(_PAGE_BIT_GLOBAL+1)
+#define _PAGE_BIT_NUMA		_PAGE_BIT_SOFTW2
 
 /* If _PAGE_BIT_PRESENT is clear, we use these: */
 /* - if the user mapped it with PROT_NONE; pte_present gives true */
@@ -325,8 +325,8 @@ static inline pteval_t pte_flags(pte_t pte)
 }
 
 #ifdef CONFIG_NUMA_BALANCING
-/* Set of bits that distinguishes present, prot_none and numa ptes */
-#define _PAGE_NUMA_MASK (_PAGE_NUMA|_PAGE_PROTNONE|_PAGE_PRESENT)
+/* Set of bits that distinguishes present and numa ptes */
+#define _PAGE_NUMA_MASK (_PAGE_NUMA|_PAGE_PRESENT)
 static inline pteval_t ptenuma_flags(pte_t pte)
 {
 	return pte_flags(pte) & _PAGE_NUMA_MASK;
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]