On 10/8/21 11:32 AM, Mina Almasry wrote: > Support mremap() for hugepage backed vma segment by simply repositioning > page table entries. The page table entries are repositioned to the new > virtual address on mremap(). > > Hugetlb mremap() support is of course generic; my motivating use case > is a library (hugepage_text), which reloads the ELF text of executables > in hugepages. This significantly increases the execution performance of > said executables. > > Restricts the mremap operation on hugepages to up to the size of the > original mapping as the underlying hugetlb reservation is not yet > capable of handling remapping to a larger size. > > During the mremap() operation we detect pmd_share'd mappings and we > unshare those during the mremap(). On access and fault the sharing is > established again. > > Signed-off-by: Mina Almasry <almasrymina@xxxxxxxxxx> > ... > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 6d2f4c25dd9fb..8200b4c8d09d8 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1015,6 +1015,35 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma) > vma->vm_private_data = (void *)0; > } > > +/* > + * Reset and decrement one ref on hugepage private reservation. > + * Called with mm->mmap_sem writer semaphore held. > + * This function should be only used by move_vma() and operate on > + * same sized vma. It should never come here with last ref on the > + * reservation. > + */ > +void clear_vma_resv_huge_pages(struct vm_area_struct *vma) > +{ > + /* > + * Clear the old hugetlb private page reservation. > + * It has already been transferred to new_vma. > + * > + * During a mremap() operation of a hugetlb vma we call move_vma() > + * which copies *vma* into *new_vma* and unmaps *vma*. After the copy > + * operation both *new_vma* and *vma* share a reference to the resv_map > + * struct, and at that point *vma* is about to be unmapped. We don't > + * want to return the reservation to the pool at unmap of *vma* because > + * the reservation still lives on in new_vma, so simply decrement the > + * ref here and remove the resv_map reference from this vma. > + */ Are the *...* for special formatting of the words somewhere? Or, just for simple added emphasis? This convention is not used anywhere else in the file. Unless there is a good reason for doing so, I would prefer to to drop the *...* convention here. > + struct resv_map *reservations = vma_resv_map(vma); > + > + if (reservations && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) > + kref_put(&reservations->refs, resv_map_release); > + > + reset_vma_resv_huge_pages(vma); > +} ... > diff --git a/mm/mremap.c b/mm/mremap.c > index c0b6c41b7b78f..6a3f7d38b7539 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -489,6 +489,10 @@ unsigned long move_page_tables(struct vm_area_struct *vma, > old_end = old_addr + len; > flush_cache_range(vma, old_addr, old_end); > > + if (is_vm_hugetlb_page(vma)) > + return move_hugetlb_page_tables(vma, new_vma, old_addr, > + new_addr, len); > + > mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, > old_addr, old_end); > mmu_notifier_invalidate_range_start(&range); > @@ -646,6 +650,10 @@ static unsigned long move_vma(struct vm_area_struct *vma, > mremap_userfaultfd_prep(new_vma, uf); > } > > + if (is_vm_hugetlb_page(vma)) { > + clear_vma_resv_huge_pages(vma); > + } > + > /* Conceal VM_ACCOUNT so old reservation is not undone */ > if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) { > vma->vm_flags &= ~VM_ACCOUNT; > @@ -739,9 +747,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, > (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))) > return ERR_PTR(-EINVAL); > > - if (is_vm_hugetlb_page(vma)) > - return ERR_PTR(-EINVAL); > - > /* We can't remap across vm area boundaries */ > if (old_len > vma->vm_end - addr) > return ERR_PTR(-EFAULT); > @@ -937,6 +942,27 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, > > if (mmap_write_lock_killable(current->mm)) > return -EINTR; > + vma = find_vma(mm, addr); > + if (!vma || vma->vm_start > addr) { > + ret = EFAULT; > + goto out; > + } > + > + if (is_vm_hugetlb_page(vma)) { > + struct hstate *h __maybe_unused = hstate_vma(vma); > + > + old_len = ALIGN(old_len, huge_page_size(h)); > + new_len = ALIGN(new_len, huge_page_size(h)); > + addr = ALIGN(addr, huge_page_size(h)); > + new_addr = ALIGN(new_addr, huge_page_size(h)); Instead of aligning addr and new_addr, we should be checking for huge page alignment and returning error. This makes it consistent with the requirement that they be PAGE aligned in the non-hugetlb case. Sorry if that was unclear in previous comments. /* addrs must be huge page aligned */ if (addr & ~huge_page_mask(h)) goto out; if (new_addr & ~huge_page_mask(h)) goto out; > + > + /* > + * Don't allow remap expansion, because the underlying hugetlb > + * reservation is not yet capable to handle split reservation. > + */ > + if (new_len > old_len) > + goto out; > + } > > if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) { > ret = mremap_to(addr, old_len, new_addr, new_len, > -- Mike Kravetz