Re: Efficient mapping of sparse file holes to zero-pages

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 20 Feb 2025 13:47:46 +0000

On Thu, Feb 20, 2025 at 01:48:18PM +0100, David Frank wrote:
> I'd like to efficiently mmap a large sparse file (ext4), 95% of which
> is holes. I was unsatisfied with the performance and after profiling,
> I found that most of the time is spent in filemap_add_folio and
> filemap_alloc_folio - much more than in my algorithm:
> 
>  - 97.87% filemap_fault
>     - 97.57% do_sync_mmap_readahead
>        - page_cache_ra_order
>           - 97.28% page_cache_ra_unbounded
>              - 40.80% filemap_add_folio
>                 + 21.93% __filemap_add_folio
>                 + 8.88% folio_add_lru
>                 + 7.56% workingset_refault
>              + 28.73% filemap_alloc_folio
>              + 22.34% read_pages
>              + 3.29% xa_load

Yes, this is expected.

The fundamental problem is that we don't have the sparseness information
at the right point.  So the read request (or pagefault) comes in, the
VFS allocates a page, puts it in the pagecache, then asks the filesystem
to fill it.  The filesystem knows, so could theoretically tell the VFS
"Oh, this is a hole", but by this point the "damage" is done -- the page
has been allocated and added to the page cache.

Of course, this is a soluble problem.  The VFS could ask the filesystem
for its sparseness information (as you do in userspace), but unlike your
particular usecase, the kernel must handle attackers who are trying to
make it do the wrong thing as well as ill-timed writes.  So the VFS has
to ensure it does not use stale data from the filesystem.

This is a problem I'm somewhat interested in solving, but I'm a bit
busy with folios right now.  And once that project is done, improving
the page cache for reflinked files is next on my list, so I'm not likely
to get to this problem for a few years.