Prepare for lockless PGD init: enable the arch_pgd_init_late() callback and add a 'careful' implementation of PGD init to it: only copy over non-zero entries. Since PGD entries only ever get added, this method catches any updates to swapper_pg_dir[] that might have occurred between early PGD init and late PGD init. Note that this only matters for code that does not use the pgd_list but the task list to find all PGDs in the system. Subsequent patches will convert pgd_list users to task-list iterations. [ This adds extra overhead in that we do the PGD initialization for a second time - a later patch will simplify this, once we don't have old pgd_list users. ] Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> Cc: Borislav Petkov <bp@xxxxxxxxx> Cc: Brian Gerst <brgerst@xxxxxxxxx> Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx> Cc: H. Peter Anvin <hpa@xxxxxxxxx> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Oleg Nesterov <oleg@xxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: linux-mm@xxxxxxxxx Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> --- arch/x86/Kconfig | 1 + arch/x86/mm/pgtable.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7e39f9b22705..15c19ce149f0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -27,6 +27,7 @@ config X86 select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_GCOV_PROFILE_ALL + select ARCH_HAS_PGD_INIT_LATE select ARCH_HAS_SG_CHAIN select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fb0a9dd1d6e4..7a561b7cc01c 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -391,6 +391,65 @@ pgd_t *pgd_alloc(struct mm_struct *mm) return NULL; } +/* + * Initialize the kernel portion of the PGD. + * + * This is done separately, because pgd_alloc() happens when + * the task is not on the task list yet - and PGD updates + * happen by walking the task list. + * + * No locking is needed here, as we just copy over the reference + * PGD. The reference PGD (pgtable_init) is only ever expanded + * at the highest, PGD level. Thus any other task extending it + * will first update the reference PGD, then modify the task PGDs. + */ +void arch_pgd_init_late(struct mm_struct *mm) +{ + /* + * This function is called after a new MM has been made visible + * in fork() or exec() via: + * + * tsk->mm = mm; + * + * This barrier makes sure the MM is visible to new RCU + * walkers before we initialize the pagetables below, so that + * we don't miss updates: + */ + smp_wmb(); + + /* + * If the pgd points to a shared pagetable level (either the + * ptes in non-PAE, or shared PMD in PAE), then just copy the + * references from swapper_pg_dir: + */ + if ( CONFIG_PGTABLE_LEVELS == 2 || + (CONFIG_PGTABLE_LEVELS == 3 && SHARED_KERNEL_PMD) || + CONFIG_PGTABLE_LEVELS == 4) { + + pgd_t *pgd_src = swapper_pg_dir + KERNEL_PGD_BOUNDARY; + pgd_t *pgd_dst = mm->pgd + KERNEL_PGD_BOUNDARY; + int i; + + for (i = 0; i < KERNEL_PGD_PTRS; i++, pgd_src++, pgd_dst++) { + /* + * This is lock-less, so it can race with PGD updates + * coming from vmalloc() or CPA methods, but it's safe, + * because: + * + * 1) this PGD is not in use yet, we have still not + * scheduled this task. + * 2) we only ever extend PGD entries + * + * So if we observe a non-zero PGD entry we can copy it, + * it won't change from under us. Parallel updates (new + * allocations) will modify our (already visible) PGD: + */ + if (!pgd_none(*pgd_src)) + set_pgd(pgd_dst, *pgd_src); + } + } +} + void pgd_free(struct mm_struct *mm, pgd_t *pgd) { pgd_mop_up_pmds(mm, pgd); -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>