ARM processors with LPAE enabled use 3 levels of page tables, with an entry in the top one (pgd/pud) covering 1GB of virtual space. Because of the relocation limitations on ARM, the loadable modules are mapped 16MB below PAGE_OFFSET, making the corresponding 1GB pgd/pud shared between kernel modules and user space. During fault processing, pmd entries corresponding to modules are populated to point to the init_mm pte tables. Since free_pgtables() is called with ceiling == 0, free_pgd_range() (and subsequently called functions) also clears the pgd/pud entry that is shared between user space and kernel modules. If a module interrupt routine is invoked during this window, the kernel gets a translation fault and becomes confused. There is proposed fix for ARM (within the arch/arm/ code) but it wouldn't be needed if the pgd range freeing is capped at TASK_SIZE. The concern is that there are architectures with vmas beyond TASK_SIZE, so the aim of this RFC is to ask whether those architectures rely on free_pgtables() to free any page tables beyond TASK_SIZE. Alternatively, we can define something like LAST_USER_ADDRESS, defaulting to 0 for most architectures. Signed-off-by: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Russell King <linux@xxxxxxxxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> --- fs/exec.c | 4 ++-- mm/mmap.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 92ce83a..f2d66ab 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -626,7 +626,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) * when the old and new regions overlap clear from new_end. */ free_pgd_range(&tlb, new_end, old_end, new_end, - vma->vm_next ? vma->vm_next->vm_start : 0); + vma->vm_next ? vma->vm_next->vm_start : TASK_SIZE); } else { /* * otherwise, clean from old_start; this is done to not touch @@ -635,7 +635,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) * for the others its just a little faster. */ free_pgd_range(&tlb, old_start, old_end, new_end, - vma->vm_next ? vma->vm_next->vm_start : 0); + vma->vm_next ? vma->vm_next->vm_start : TASK_SIZE); } tlb_finish_mmu(&tlb, new_end, old_end); diff --git a/mm/mmap.c b/mm/mmap.c index 3f758c7..5e5c8a8 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1866,7 +1866,7 @@ static void unmap_region(struct mm_struct *mm, unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS, - next ? next->vm_start : 0); + next ? next->vm_start : TASK_SIZE); tlb_finish_mmu(&tlb, start, end); } @@ -2241,7 +2241,7 @@ void exit_mmap(struct mm_struct *mm) end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL); vm_unacct_memory(nr_accounted); - free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0); + free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, TASK_SIZE); tlb_finish_mmu(&tlb, 0, end); /* -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html