On Mon, Oct 25, 2021 at 09:53:01AM -0500, Goldwyn Rodrigues wrote: > On 2:43 23/10, Matthew Wilcox wrote: > > On Fri, Oct 22, 2021 at 03:15:00PM -0500, Goldwyn Rodrigues wrote: > > > This is an attempt to reduce the memory footprint by using a shared > > > page(s) for shared extent(s) in the filesystem. I am hoping to start a > > > discussion to iron out the details for implementation. > > > > When you say "Shared extents", you mean reflinks, which are COW, right? > > Yes, shared extents are extents which are shared on disk by two or more > files. Yes, same as reflinks. Just to explain with an example: > > If two files, f1 and f2 have shared extent(s), and both files are read. Each > file's mapping->i_pages will hold a copy of the contents of the shared > extent on disk. So, f1->mapping will have one copy and f2->mapping will > have another copy. > > For reads (and only reads), if we use underlying device's mapping, we > can save on duplicate copy of the pages. Yes; I'm familiar with the problem. Dave Chinner and I had a great discussion about it at LCA a couple of years ago. The implementation I've had in mind for a while is that the filesystem either creates a separate inode for a shared extent, or (as you've done here) uses the bdev's inode. We can discuss the pros/cons of that separately. To avoid the double-lookup problem, I was intending to generalise DAX entries into PFN entries. That way, if the read() (or mmap read fault) misses in the inode's cache, we can look up the shared extent cache, and then cache the physical address of the memory in the inode. That makes reclaim/eviction of the page in the shared extent more expensive because you have to iterate all the inodes which share the extent and remove the PFN entries before the page can be reused. Perhaps we should have a Zoom meeting about this before producing duelling patch series? I can host if you're interested.