On Tue, Jan 11, 2022 at 06:30:31PM -0800, Hugh Dickins wrote: > But I have to say that use of ZERO_PAGE for shmem/memfd/tmpfs read-fault > might (potentially) be very welcome. Not as some MFD_ZEROPAGE special > case, but as how it would always work. Deleting the shmem_recalc_inode() > cruft, which is there to correct accounting for the unmodified read-only > pages, after page reclaim has got around to freeing them later. > > It does require more work than you gave it in 1/1: mainly, as you call > out above, there's a need to note in the mapping's XArray when ZERO_PAGE > has been used at an offset, and do an rmap walk to unmap those ptes when > a writable page is substituted - see __xip_unmap() in Linux 3.19's > mm/filemap_xip.c for such an rmap walk. I think putting a pointer to the zero page in the XArray would introduce some unwelcome complexity, but the XArray has a special XA_ZERO_ENTRY which might be usable for such a thing. It would need some careful analysis and testing, of course, but it might also let us remove the special cases in the DAX code for DAX_ZERO_PAGE. I agree with you that temporarily allocating pages has worked "well enough", but maybe some workloads would benefit; even for files on block device filesystems, reading a hole and never writing to it may be common enough that this is an optimisation we've been missing for many years.