On Thu, May 19, 2016 at 12:17:14PM +0200, Miklos Szeredi wrote: > On Thu, May 19, 2016 at 11:05 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Thu 19-05-16 10:20:13, Miklos Szeredi wrote: > >> Has anyone thought about sharing pages between multiple files? > >> > >> The obvious application is for COW filesytems where there are > >> logically distinct files that physically share data and could easily > >> share the cache as well if there was infrastructure for it. > > > > FYI this has been discussed at LSFMM this year[1]. I wasn't at the > > session so cannot tell you any details but the LWN article covers it at > > least briefly. > > Cool, so it's not such a crazy idea. Oh, it most certainly is crazy. :P > Darrick, would you mind briefly sharing your ideas regarding this? The current line of though is that we'll only attempt this in XFS on inodes that are known to share underlying physical extents. i.e. files that have blocks that have been reflinked or deduped. That way we can overload the breaking of reflink blocks (via copy on write) with unsharing the pages in the page cache for that inode. i.e. shared pages can propagate upwards in overlay if it uses reflink for copy-up and writes will then break the sharing with the underlying source without overlay having to do anything special. Right now I'm not sure what mechanism we will use - we want to support files that have a mix of private and shared pages, so that implies we are not going to be sharing mappings but sharing pages instead. However, we've been looking at this as being completely encapsulated within the filesystem because it's tightly linked to changes in the physical layout of the filesystem, not as general "share this mapping between two unrelated inodes" infrastructure. That may change as we dig deeper into it... > The use case I have is fixing overlayfs weird behavior. The following > may result in "buf" not matching "data": > > int fr = open("foo", O_RDONLY); > int fw = open("foo", O_RDWR); > write(fw, data, sizeof(data)); > read(fr, buf, sizeof(data)); > > The reason is that "foo" is on a read-only layer, and opening it for > read-write triggers copy-up into a read-write layer. However the old, > read-only open still refers to the unmodified file. > > Fixing this properly requires that when opening a file, we don't > delegate operations fully to the underlying file, but rather allow > sharing of pages from underlying file until the file is copied up. At > that point we switch to sharing pages with the read-write copy. Unless I'm missing something here (quite possible!), I'm not sure we can fix that problem with page cache sharing or reflink. It implies we are sharing pages in a downwards direction - private overlay pages/mappings from multiple inodes would need to be shared with a single underlying shared read-only inode, and I lack the imagination to see how that works... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html