On 5/3/21 2:31 PM, Peter Xu wrote: > Mike, > > On Mon, May 03, 2021 at 11:55:41AM -0700, Mike Kravetz wrote: >> On 5/1/21 7:41 AM, Peter Xu wrote: >>> F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. >>> There is a test program for that and it fails constantly. >>> >>> $ ./memfd_test hugetlbfs >>> memfd-hugetlb: CREATE >>> memfd-hugetlb: BASIC >>> memfd-hugetlb: SEAL-WRITE >>> memfd-hugetlb: SEAL-FUTURE-WRITE >>> mmap() didn't fail as expected >>> Aborted (core dumped) >>> >>> I think it's probably because no one is really running the hugetlbfs test. >>> >>> Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in >>> shmem_mmap(). Generalize a helper for that. >>> >>> Reported-by: Hugh Dickins <hughd@xxxxxxxxxx> >>> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> >>> --- >>> fs/hugetlbfs/inode.c | 5 +++++ >>> include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++ >>> mm/shmem.c | 22 ++++------------------ >>> 3 files changed, 41 insertions(+), 18 deletions(-) >> >> Thanks Peter and Hugh! >> >> One question below, >> >>> >>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c >>> index a2a42335e8fd2..39922c0f2fc8c 100644 >>> --- a/fs/hugetlbfs/inode.c >>> +++ b/fs/hugetlbfs/inode.c >>> @@ -131,10 +131,15 @@ static void huge_pagevec_release(struct pagevec *pvec) >>> static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) >>> { >>> struct inode *inode = file_inode(file); >>> + struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode); >>> loff_t len, vma_len; >>> int ret; >>> struct hstate *h = hstate_file(file); >>> >>> + ret = seal_check_future_write(info->seals, vma); >>> + if (ret) >>> + return ret; >>> + >>> /* >>> * vma address alignment (but not the pgoff alignment) has >>> * already been checked by prepare_hugepage_range. If you add >> >> The full comment below the code you added is: >> >> /* >> * vma address alignment (but not the pgoff alignment) has >> * already been checked by prepare_hugepage_range. If you add >> * any error returns here, do so after setting VM_HUGETLB, so >> * is_vm_hugetlb_page tests below unmap_region go the right >> * way when do_mmap unwinds (may be important on powerpc >> * and ia64). >> */ >> >> This comment was added in commit 68589bc35303 by Hugh, although it >> appears David Gibson added the reason for the comment in the commit >> message: >> >> "If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example, >> because the given file offset is not hugepage aligned - then do_mmap_pgoff >> will go to the unmap_and_free_vma backout path. >> >> But at this stage the vma hasn't been marked as hugepage, and the backout path >> will call unmap_region() on it. That will eventually call down to the >> non-hugepage version of unmap_page_range(). On ppc64, at least, that will >> cause serious problems if there are any existing hugepage pagetable entries in >> the vicinity - for example if there are any other hugepage mappings under the >> same PUD. unmap_page_range() will trigger a bad_pud() on the hugepage pud >> entries. I suspect this will also cause bad problems on ia64, though I don't >> have a machine to test it on." >> >> There are still comments in the unmap code about special handling of >> ppc64 PUDs. So, this may still be an issue. >> >> I am trying to dig into the code to determine if this is still and >> issue. Just curious if you looked into this? Might be simpler and >> safer to just put the seal check after setting the VM_HUGETLB flag? > > Good catch! I overlooked on that, and I definitely didn't look into it yet. > For now I'd better move that check to be after the flag settings in all cases. > > I'll also add: > > Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") > Thanks! With those changes, you can add, Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> -- Mike Kravetz