On 2018/09/12 22:42, Michal Hocko wrote: > On Wed 12-09-18 09:50:54, Michal Hocko wrote: >> On Tue 11-09-18 23:01:57, Tetsuo Handa wrote: >>> On 2018/09/10 21:55, Michal Hocko wrote: >>>> This is a very coarse implementation of the idea I've had before. >>>> Please note that I haven't tested it yet. It is mostly to show the >>>> direction I would wish to go for. >>> >>> Hmm, this patchset does not allow me to boot. ;-) >>> >>> free_pgd_range(&tlb, vma->vm_start, vma->vm_prev->vm_end, >>> FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); >>> >>> [ 1.875675] sched_clock: Marking stable (1810466565, 65169393)->(1977240380, -101604422) >>> [ 1.877833] registered taskstats version 1 >>> [ 1.877853] Loading compiled-in X.509 certificates >>> [ 1.878835] zswap: loaded using pool lzo/zbud >>> [ 1.880835] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 >> >> This is vm_prev == NULL. I thought we always have vm_prev as long as >> this is not a single VMA in the address space. I will double check this. > > So this is me misunderstanding the code. vm_next, vm_prev are not a full > doubly linked list. The first entry doesn't really refer to the last > entry. So the above cannot work at all. We can go around this in two > ways. Either keep the iteration or use the following which should cover > the full mapped range, unless I am missing something > > diff --git a/mm/mmap.c b/mm/mmap.c > index 64e8ccce5282..078295344a17 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3105,7 +3105,7 @@ void exit_mmap(struct mm_struct *mm) > up_write(&mm->mmap_sem); > } > > - free_pgd_range(&tlb, vma->vm_start, vma->vm_prev->vm_end, > + free_pgd_range(&tlb, vma->vm_start, mm->highest_vm_end, > FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); > tlb_finish_mmu(&tlb, 0, -1); > This is bad because architectures where hugetlb_free_pgd_range() does more than free_pgd_range() need to check VM_HUGETLB flag for each "vma". Thus, I think we need to keep the iteration.