Cc'ing Dave Hansen on this. * Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [240708 16:43]: > * Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [240708 08:53]: > > On Thu, Jul 04, 2024 at 02:27:18PM GMT, Liam R. Howlett wrote: > > > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> > > > > > > The MAP_FIXED page count is available after the vms_gather_munmap_vmas() > > > call, so use it instead of looping over the vmas twice. > > > > Predictably indeed you removed the thing I commented on in the last patch > > ;) but at least this time I predicted it! ;) > > > > > > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > > > --- > > > mm/mmap.c | 36 ++++-------------------------------- > > > 1 file changed, 4 insertions(+), 32 deletions(-) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index b2de26683903..62edaabf3987 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c ... > > > static void __vma_link_file(struct vm_area_struct *vma, > > > struct address_space *mapping) > > > { > > > @@ -2946,17 +2925,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > > > pgoff_t vm_pgoff; > > > int error = -ENOMEM; > > > VMA_ITERATOR(vmi, mm, addr); > > > - unsigned long nr_pages, nr_accounted; > > > - > > > - nr_pages = count_vma_pages_range(mm, addr, end, &nr_accounted); > > > - > > > - /* Check against address space limit. */ > > > - /* > > > - * MAP_FIXED may remove pages of mappings that intersects with requested > > > - * mapping. Account for the pages it would unmap. > > > - */ > > > - if (!may_expand_vm(mm, vm_flags, pglen - nr_pages)) > > > - return -ENOMEM; > > > > > > if (unlikely(!can_modify_mm(mm, addr, end))) > > > return -EPERM; > > > @@ -2987,6 +2955,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > > > vma_iter_next_range(&vmi); > > > } > > > > > > + /* Check against address space limit. */ > > > + if (!may_expand_vm(mm, vm_flags, pglen - vms.nr_pages)) > > > + goto abort_munmap; > > > + > > > > I know you can literally only do this after the vms_gather_munmap_vmas(), > > but this does change where we check this, so for instance we do > > arch_unmap() without having checked may_expand_vm(). > > > > However I assume this is fine? > > Thanks for pointing this out. > > The functionality here has changed > --- from --- > may_expand_vm() check > can_modify_mm() check > arch_unmap() > vms_gather_munmap_vmas() > ... > > --- to --- > can_modify_mm() check > arch_unmap() > vms_gather_munmap_vmas() > may_expand_vm() check > ... > > vms_gather_munmap_vmas() does nothing but figures out what to do later, > but could use memory and can fail. > > The user implications are: > > 1. The return type on the error may change to -EPERM from -ENOMEM, if > you are not allowed to expand and are trying to overwrite mseal()'ed > VMAs. That seems so very rare that I'm not sure it's worth mentioning. > > > 2. arch_unmap() called prior to may_expand_vm(). > powerpc uses this to set mm->context.vdso = NULL if mm->context.vdso is > within the unmap range. User implication of this means that an > application my set the vdso to NULL prior to hitting the -ENOMEM case in > may_expand_vm() due to the address space limit. > > Assuming the removal of the vdso does not cause the application to seg > fault, then the user visible change is that any vdso call after a failed > mmap(MAP_FIXED) call would result in a seg fault. The only reason it > would fail is if the mapping process was attempting to map a large > enough area over the vdso (which is accounted and in the vma tree, > afaict) and ran out of memory. Note that this situation could arise > already since we could run out of memory (not accounting) after the > arch_unmap() call within the kernel. > > The code today can suffer the same fate, but not by the accounting > failure. It can happen due to failure to allocate a new vma, > do_vmi_munmap() failure after the arch_unmap() call, or any of the other > failure scenarios later in the mmap_region() function. > > At the very least, this requires an expanded change log. After doing a deep dive into the vdso issue, I think it would be best to remove the arch_unmap() call completely in a later patch set by changing the two areas highlighted by Dave in patch 5a28fc94c914 "x86/mpx, mm/core: Fix recursive munmap() corruption" back in 2019 in regards to the powerpc pointer use. But that's for later work. In the above mentioned patch, the arch_unmap() was moved to an earlier time to avoid removing the same vma twice from the rbtree. Since the mpx code no longer removes the vma and powerpc never removed the vma, it seems safe to reorder the calls as such: can_modify_mm() check vms_gather_munmap_vmas() may_expand_vm() check arch_unmap() This seems very much fine because: - powerpc is the only platform doing _anything_ in arch_unmap(). - powerpc used to work with the arch_unmap() call after the vma was completely dropped. - The vma isn't even dropped by this point and so all proposed changes will be completely undone in the rare case of may_expand_vm() failure. - The arch_unmap() call doesn't need to be that early anymore anyways (mpx was dropped by Dave in 2020 git id ccaaaf6fe5a5). I will make the order change in v4 of the patch series in its own patch. Thanks, Liam