On Thu, Feb 20, 2025 at 01:48:18PM +0100, David Frank wrote: > I'd like to efficiently mmap a large sparse file (ext4), 95% of which > is holes. I was unsatisfied with the performance and after profiling, > I found that most of the time is spent in filemap_add_folio and > filemap_alloc_folio - much more than in my algorithm: > > - 97.87% filemap_fault > - 97.57% do_sync_mmap_readahead > - page_cache_ra_order > - 97.28% page_cache_ra_unbounded > - 40.80% filemap_add_folio > + 21.93% __filemap_add_folio > + 8.88% folio_add_lru > + 7.56% workingset_refault > + 28.73% filemap_alloc_folio > + 22.34% read_pages > + 3.29% xa_load Yes, this is expected. The fundamental problem is that we don't have the sparseness information at the right point. So the read request (or pagefault) comes in, the VFS allocates a page, puts it in the pagecache, then asks the filesystem to fill it. The filesystem knows, so could theoretically tell the VFS "Oh, this is a hole", but by this point the "damage" is done -- the page has been allocated and added to the page cache. Of course, this is a soluble problem. The VFS could ask the filesystem for its sparseness information (as you do in userspace), but unlike your particular usecase, the kernel must handle attackers who are trying to make it do the wrong thing as well as ill-timed writes. So the VFS has to ensure it does not use stale data from the filesystem. This is a problem I'm somewhat interested in solving, but I'm a bit busy with folios right now. And once that project is done, improving the page cache for reflinked files is next on my list, so I'm not likely to get to this problem for a few years.