On 3/30/22 00:58, David Hildenbrand wrote: > On 24.03.22 18:06, Mike Kravetz wrote: >> hugetlbfs fallocate support was originally added with commit 70c3547e36f5 >> ("hugetlbfs: add hugetlbfs_fallocate()"). Initial support only operated >> on whole hugetlb pages. This makes sense for populating files as other >> interfaces such as mmap and truncate require hugetlb page size alignment. >> Only operating on whole hugetlb pages for the hole punch case was a >> simplification and there was no compelling use case to zero partial pages. >> >> In a recent discussion[1] it was assumed that hugetlbfs hole punch would >> zero partial hugetlb pages as that is in line with the man page >> description saying 'partial filesystem blocks are zeroed'. However, >> the hugetlbfs hole punch code actually does this: >> >> hole_start = round_up(offset, hpage_size); >> hole_end = round_down(offset + len, hpage_size); >> >> Modify code to zero partial hugetlb pages in hole punch range. It is >> possible that application code could note a change in behavior. However, >> that would imply the code is passing in an unaligned range and expecting >> only whole pages be removed. This is unlikely as the fallocate >> documentation states the opposite. > > Yeah, some weird code could behave differently; an app would have to > pass in an unaligned range and expect that partially covered hugetlbfs > pages remain unmodified. It's hard to think of reasonable apps that do > that, but of course, some buggy code might then be *actually* broken. > Like some messed-up align-up implementation that accidentally adds +1 > too much. > >> >> The current hugetlbfs fallocate hole punch behavior is tested with the >> libhugetlbfs test fallocate_align[2]. This test will be updated to >> validate partial page zeroing. > > This is in line with other fallocate() behavior and documented > semantics, so I think that's the right thing to do. > > I think it's worth to give it a try, it's hard to imagine that this > actually breaks something. > > > "After a successful call, subsequent reads from this range will return > zeros." will work as expected with your change. > >> >> [1] https://lore.kernel.org/linux-mm/20571829-9d3d-0b48-817c-b6b15565f651@xxxxxxxxxx/ >> [2] https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/fallocate_align.c >> >> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >> --- >> fs/hugetlbfs/inode.c | 67 ++++++++++++++++++++++++++++++--------- >> include/asm-generic/tlb.h | 2 ++ >> 2 files changed, 54 insertions(+), 15 deletions(-) >> >> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c >> index a7c6c7498be0..f62ec4f71132 100644 >> --- a/fs/hugetlbfs/inode.c >> +++ b/fs/hugetlbfs/inode.c >> @@ -587,41 +587,78 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) >> remove_inode_hugepages(inode, offset, LLONG_MAX); >> } >> >> +static void hugetlbfs_zero_partial_page(struct hstate *h, >> + struct address_space *mapping, >> + unsigned long start, >> + unsigned long end) >> +{ >> + struct page *page; >> + pgoff_t idx = start >> huge_page_shift(h); > > I'm a fan of reverse Christmas trees :) > Ok, no preference by me so I will change. ... >> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h >> index 2c68a545ffa7..4622ee45f739 100644 >> --- a/include/asm-generic/tlb.h >> +++ b/include/asm-generic/tlb.h >> @@ -562,6 +562,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, >> __tlb_remove_tlb_entry(tlb, ptep, address); \ >> } while (0) >> >> +#ifndef tlb_remove_huge_tlb_entry >> #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ >> do { \ >> unsigned long _sz = huge_page_size(h); \ >> @@ -571,6 +572,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, >> tlb_flush_pud_range(tlb, address, _sz); \ >> __tlb_remove_tlb_entry(tlb, ptep, address); \ >> } while (0) >> +#endif > > Was this change supposed to be included in this patch? No, this should not have been included. Sorry. Thanks for taking a look. -- Mike Kravetz