* Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [240709 10:42]: > Cc'ing Dave Hansen on this. Really adding Dave to the discussion. > > * Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [240708 16:43]: > > * Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [240708 08:53]: > > > On Thu, Jul 04, 2024 at 02:27:18PM GMT, Liam R. Howlett wrote: > > > > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> > > > > > > > > The MAP_FIXED page count is available after the vms_gather_munmap_vmas() > > > > call, so use it instead of looping over the vmas twice. > > > > > > Predictably indeed you removed the thing I commented on in the last patch > > > ;) but at least this time I predicted it! ;) > > > > > > > > > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > > > > --- > > > > mm/mmap.c | 36 ++++-------------------------------- > > > > 1 file changed, 4 insertions(+), 32 deletions(-) > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > index b2de26683903..62edaabf3987 100644 > > > > --- a/mm/mmap.c > > > > +++ b/mm/mmap.c > > ... > > > > > static void __vma_link_file(struct vm_area_struct *vma, > > > > struct address_space *mapping) > > > > { > > > > @@ -2946,17 +2925,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > > > > pgoff_t vm_pgoff; > > > > int error = -ENOMEM; > > > > VMA_ITERATOR(vmi, mm, addr); > > > > - unsigned long nr_pages, nr_accounted; > > > > - > > > > - nr_pages = count_vma_pages_range(mm, addr, end, &nr_accounted); > > > > - > > > > - /* Check against address space limit. */ > > > > - /* > > > > - * MAP_FIXED may remove pages of mappings that intersects with requested > > > > - * mapping. Account for the pages it would unmap. > > > > - */ > > > > - if (!may_expand_vm(mm, vm_flags, pglen - nr_pages)) > > > > - return -ENOMEM; > > > > > > > > if (unlikely(!can_modify_mm(mm, addr, end))) > > > > return -EPERM; > > > > @@ -2987,6 +2955,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > > > > vma_iter_next_range(&vmi); > > > > } > > > > > > > > + /* Check against address space limit. */ > > > > + if (!may_expand_vm(mm, vm_flags, pglen - vms.nr_pages)) > > > > + goto abort_munmap; > > > > + > > > > > > I know you can literally only do this after the vms_gather_munmap_vmas(), > > > but this does change where we check this, so for instance we do > > > arch_unmap() without having checked may_expand_vm(). > > > > > > However I assume this is fine? > > > > Thanks for pointing this out. > > > > The functionality here has changed > > --- from --- > > may_expand_vm() check > > can_modify_mm() check > > arch_unmap() > > vms_gather_munmap_vmas() > > ... > > > > --- to --- > > can_modify_mm() check > > arch_unmap() > > vms_gather_munmap_vmas() > > may_expand_vm() check > > ... > > > > vms_gather_munmap_vmas() does nothing but figures out what to do later, > > but could use memory and can fail. > > > > The user implications are: > > > > 1. The return type on the error may change to -EPERM from -ENOMEM, if > > you are not allowed to expand and are trying to overwrite mseal()'ed > > VMAs. That seems so very rare that I'm not sure it's worth mentioning. > > > > > > 2. arch_unmap() called prior to may_expand_vm(). > > powerpc uses this to set mm->context.vdso = NULL if mm->context.vdso is > > within the unmap range. User implication of this means that an > > application my set the vdso to NULL prior to hitting the -ENOMEM case in > > may_expand_vm() due to the address space limit. > > > > Assuming the removal of the vdso does not cause the application to seg > > fault, then the user visible change is that any vdso call after a failed > > mmap(MAP_FIXED) call would result in a seg fault. The only reason it > > would fail is if the mapping process was attempting to map a large > > enough area over the vdso (which is accounted and in the vma tree, > > afaict) and ran out of memory. Note that this situation could arise > > already since we could run out of memory (not accounting) after the > > arch_unmap() call within the kernel. > > > > The code today can suffer the same fate, but not by the accounting > > failure. It can happen due to failure to allocate a new vma, > > do_vmi_munmap() failure after the arch_unmap() call, or any of the other > > failure scenarios later in the mmap_region() function. > > > > At the very least, this requires an expanded change log. > > After doing a deep dive into the vdso issue, I think it would be best to > remove the arch_unmap() call completely in a later patch set by changing > the two areas highlighted by Dave in patch 5a28fc94c914 "x86/mpx, > mm/core: Fix recursive munmap() corruption" back in 2019 in regards to > the powerpc pointer use. But that's for later work. > > In the above mentioned patch, the arch_unmap() was moved to an earlier > time to avoid removing the same vma twice from the rbtree. Since the > mpx code no longer removes the vma and powerpc never removed the vma, it > seems safe to reorder the calls as such: > > can_modify_mm() check > vms_gather_munmap_vmas() > may_expand_vm() check > arch_unmap() > > This seems very much fine because: > - powerpc is the only platform doing _anything_ in arch_unmap(). > - powerpc used to work with the arch_unmap() call after the vma was > completely dropped. > - The vma isn't even dropped by this point and so all proposed changes > will be completely undone in the rare case of may_expand_vm() failure. > - The arch_unmap() call doesn't need to be that early anymore anyways > (mpx was dropped by Dave in 2020 git id ccaaaf6fe5a5). > > I will make the order change in v4 of the patch series in its own patch. > > Thanks, > Liam