On Thu, Jun 13, 2024 at 10:16:10AM +0200, David Hildenbrand wrote: > On 13.06.24 10:13, Luis Chamberlain wrote: > > On Thu, Jun 13, 2024 at 10:07:15AM +0200, David Hildenbrand wrote: > > > On 13.06.24 09:57, Luis Chamberlain wrote: > > > > On Wed, Jun 12, 2024 at 08:08:15PM +0100, Matthew Wilcox wrote: > > > > > On Fri, Jun 07, 2024 at 02:58:57PM +0000, Pankaj Raghav (Samsung) wrote: > > > > > > From: Pankaj Raghav <p.raghav@xxxxxxxxxxx> > > > > > > > > > > > > Usually the page cache does not extend beyond the size of the inode, > > > > > > therefore, no PTEs are created for folios that extend beyond the size. > > > > > > > > > > > > But with LBS support, we might extend page cache beyond the size of the > > > > > > inode as we need to guarantee folios of minimum order. Cap the PTE range > > > > > > to be created for the page cache up to the max allowed zero-fill file > > > > > > end, which is aligned to the PAGE_SIZE. > > > > > > > > > > I think this is slightly misleading because we might well zero-fill > > > > > to the end of the folio. The issue is that we're supposed to SIGBUS > > > > > if userspace accesses pages which lie entirely beyond the end of this > > > > > file. Can you rephrase this? > > > > > > > > > > (from mmap(2)) > > > > > SIGBUS Attempted access to a page of the buffer that lies beyond the end > > > > > of the mapped file. For an explanation of the treatment of the > > > > > bytes in the page that corresponds to the end of a mapped file > > > > > that is not a multiple of the page size, see NOTES. > > > > > > > > > > > > > > > The code is good though. > > > > > > > > > > Reviewed-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > > > > > > > Since I've been curating the respective fstests test to test for this > > > > POSIX corner case [0] I wanted to enable the test for tmpfs instead of > > > > skipping it as I originally had it, and that meant also realizing mmap(2) > > > > specifically says this now: > > > > > > > > Huge page (Huge TLB) mappings > > > > > > Confusion alert: this likely talks about hugetlb (MAP_HUGETLB), not THP and > > > friends. > > > > > > So it might not be required for below changes. > > > > Thanks, I had to ask as we're dusting off this little obscure corner of > > the universe. Reason I ask, is the test fails for tmpfs with huge pages, > > and this patch fixes it, but it got me wondering the above applies also > > to tmpfs with huge pages. > > Is it tmpfs with THP/large folios or shmem with hugetlb? I assume the tmpfs > with THP. There are not really mmap/munmap restrictions to THP and friends > (because it's supposed to be "transparent" :) ). The case I tested that failed the test was tmpfs with huge pages (not large folios). So should we then have this: diff --git a/mm/filemap.c b/mm/filemap.c index ea78963f0956..649beb9bbc6b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3617,6 +3617,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, vm_fault_t ret = 0; unsigned long rss = 0; unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type; + unsigned int align = PAGE_SIZE; rcu_read_lock(); folio = next_uptodate_folio(&xas, mapping, end_pgoff); @@ -3636,7 +3637,16 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, goto out; } - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; + /* + * As per the mmap(2) mmap(), the offset must be a multiple of the + * underlying huge page size. The system automatically aligns length to + * be a multiple of the underlying huge page size. + */ + if (folio_test_pmd_mappable(folio) && + (shmem_mapping(mapping) || folio_test_hugetlb(folio))) + align = 1 << folio_order(folio); + + file_end = DIV_ROUND_UP(i_size_read(mapping->host), align) - 1; if (end_pgoff > file_end) end_pgoff = file_end;