On Tue, Jul 02, 2024 at 01:24:36PM GMT, Liam R. Howlett wrote: > * Lorenzo Stoakes <lstoakes@xxxxxxxxx> [240628 10:35]: > > The vma_shrink() and vma_expand() functions are internal VMA manipulation > > functions which we ought to abstract for use outside of memory management > > code. > > > > To achieve this, we abstract the operation performed in fs/exec.c by > > shift_arg_pages() into a new relocate_vma() function implemented in > > mm/mmap.c, which enables us to also move move_page_tables() and > > vma_iter_prev_range() to internal.h. > > > > The purpose of doing this is to isolate key VMA manipulation functions in > > order that we can both abstract them and later render them easily testable. > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@xxxxxxxxx> > > --- > > fs/exec.c | 68 ++------------------------------------ > > include/linux/mm.h | 17 +--------- > > mm/internal.h | 18 +++++++++++ > > mm/mmap.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 102 insertions(+), 82 deletions(-) > > > > diff --git a/fs/exec.c b/fs/exec.c > > index 40073142288f..5cf53e20d8df 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -683,75 +683,11 @@ static int copy_strings_kernel(int argc, const char *const *argv, > > /* > > * During bprm_mm_init(), we create a temporary stack at STACK_TOP_MAX. Once > > * the binfmt code determines where the new stack should reside, we shift it to > > - * its final location. The process proceeds as follows: > > - * > > - * 1) Use shift to calculate the new vma endpoints. > > - * 2) Extend vma to cover both the old and new ranges. This ensures the > > - * arguments passed to subsequent functions are consistent. > > - * 3) Move vma's page tables to the new range. > > - * 4) Free up any cleared pgd range. > > - * 5) Shrink the vma to cover only the new range. > > + * its final location. > > */ > > static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) > > { > > - struct mm_struct *mm = vma->vm_mm; > > - unsigned long old_start = vma->vm_start; > > - unsigned long old_end = vma->vm_end; > > - unsigned long length = old_end - old_start; > > - unsigned long new_start = old_start - shift; > > - unsigned long new_end = old_end - shift; > > - VMA_ITERATOR(vmi, mm, new_start); > > - struct vm_area_struct *next; > > - struct mmu_gather tlb; > > - > > - BUG_ON(new_start > new_end); > > - > > - /* > > - * ensure there are no vmas between where we want to go > > - * and where we are > > - */ > > - if (vma != vma_next(&vmi)) > > - return -EFAULT; > > - > > - vma_iter_prev_range(&vmi); > > - /* > > - * cover the whole range: [new_start, old_end) > > - */ > > - if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL)) > > - return -ENOMEM; > > - > > - /* > > - * move the page tables downwards, on failure we rely on > > - * process cleanup to remove whatever mess we made. > > - */ > > - if (length != move_page_tables(vma, old_start, > > - vma, new_start, length, false, true)) > > - return -ENOMEM; > > - > > - lru_add_drain(); > > - tlb_gather_mmu(&tlb, mm); > > - next = vma_next(&vmi); > > - if (new_end > old_start) { > > - /* > > - * when the old and new regions overlap clear from new_end. > > - */ > > - free_pgd_range(&tlb, new_end, old_end, new_end, > > - next ? next->vm_start : USER_PGTABLES_CEILING); > > - } else { > > - /* > > - * otherwise, clean from old_start; this is done to not touch > > - * the address space in [new_end, old_start) some architectures > > - * have constraints on va-space that make this illegal (IA64) - > > - * for the others its just a little faster. > > - */ > > - free_pgd_range(&tlb, old_start, old_end, new_end, > > - next ? next->vm_start : USER_PGTABLES_CEILING); > > - } > > - tlb_finish_mmu(&tlb); > > - > > - vma_prev(&vmi); > > - /* Shrink the vma to just the new range */ > > - return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); > > + return relocate_vma(vma, shift); > > } > > The end result is a function that simply returns the results of your new > function. shift_arg_pages() is used once and mentioned in a single > comment in mm/mremap.c. I wonder if it's worth just dropping the > function entirely and just replacing the call to shift_arg_pages() to > relocate_vma()? Yeah that's a good idea, will do. Can move the comment over to the invocation also. > > I'm happy either way, the compiler should do the Right Thing(tm) the way > it is written. > > ... > > > + > > +/* > > + * Relocate a VMA downwards by shift bytes. There cannot be any VMAs between > > + * this VMA and its relocated range, which will now reside at [vma->vm_start - > > + * shift, vma->vm_end - shift). > > + * > > + * This function is almost certainly NOT what you want for anything other than > > + * early executable temporary stack relocation. > > + */ > > +int relocate_vma(struct vm_area_struct *vma, unsigned long shift) > > The name relocate_vma() implies it could be used in any direction, but > it can only shift downwards. This is also true for the > shift_arg_pages() as well and at least the comments state which way > things are going, and that the vma is also moving. > > It might be worth stating the pages are also being relocated in the > comment. > > Again, I'm happy enough to keep it this way but I wanted to point it > out. Yeah, amusingly I was thinking about this when I wrote this, but worried that relocate_vma_down() would be overwrought. However on second thoughts I think you're right, this isn't ideal. Will change. Also ack re: comment. > > ... > > Thanks, > Liam