Mike, On Mon, May 03, 2021 at 11:55:41AM -0700, Mike Kravetz wrote: > On 5/1/21 7:41 AM, Peter Xu wrote: > > F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. > > There is a test program for that and it fails constantly. > > > > $ ./memfd_test hugetlbfs > > memfd-hugetlb: CREATE > > memfd-hugetlb: BASIC > > memfd-hugetlb: SEAL-WRITE > > memfd-hugetlb: SEAL-FUTURE-WRITE > > mmap() didn't fail as expected > > Aborted (core dumped) > > > > I think it's probably because no one is really running the hugetlbfs test. > > > > Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in > > shmem_mmap(). Generalize a helper for that. > > > > Reported-by: Hugh Dickins <hughd@xxxxxxxxxx> > > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > > --- > > fs/hugetlbfs/inode.c | 5 +++++ > > include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++ > > mm/shmem.c | 22 ++++------------------ > > 3 files changed, 41 insertions(+), 18 deletions(-) > > Thanks Peter and Hugh! > > One question below, > > > > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > > index a2a42335e8fd2..39922c0f2fc8c 100644 > > --- a/fs/hugetlbfs/inode.c > > +++ b/fs/hugetlbfs/inode.c > > @@ -131,10 +131,15 @@ static void huge_pagevec_release(struct pagevec *pvec) > > static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) > > { > > struct inode *inode = file_inode(file); > > + struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode); > > loff_t len, vma_len; > > int ret; > > struct hstate *h = hstate_file(file); > > > > + ret = seal_check_future_write(info->seals, vma); > > + if (ret) > > + return ret; > > + > > /* > > * vma address alignment (but not the pgoff alignment) has > > * already been checked by prepare_hugepage_range. If you add > > The full comment below the code you added is: > > /* > * vma address alignment (but not the pgoff alignment) has > * already been checked by prepare_hugepage_range. If you add > * any error returns here, do so after setting VM_HUGETLB, so > * is_vm_hugetlb_page tests below unmap_region go the right > * way when do_mmap unwinds (may be important on powerpc > * and ia64). > */ > > This comment was added in commit 68589bc35303 by Hugh, although it > appears David Gibson added the reason for the comment in the commit > message: > > "If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example, > because the given file offset is not hugepage aligned - then do_mmap_pgoff > will go to the unmap_and_free_vma backout path. > > But at this stage the vma hasn't been marked as hugepage, and the backout path > will call unmap_region() on it. That will eventually call down to the > non-hugepage version of unmap_page_range(). On ppc64, at least, that will > cause serious problems if there are any existing hugepage pagetable entries in > the vicinity - for example if there are any other hugepage mappings under the > same PUD. unmap_page_range() will trigger a bad_pud() on the hugepage pud > entries. I suspect this will also cause bad problems on ia64, though I don't > have a machine to test it on." > > There are still comments in the unmap code about special handling of > ppc64 PUDs. So, this may still be an issue. > > I am trying to dig into the code to determine if this is still and > issue. Just curious if you looked into this? Might be simpler and > safer to just put the seal check after setting the VM_HUGETLB flag? Good catch! I overlooked on that, and I definitely didn't look into it yet. For now I'd better move that check to be after the flag settings in all cases. I'll also add: Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Thanks, -- Peter Xu