On Fri, May 20, 2016 at 1:48 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Thu, May 19, 2016 at 12:17:14PM +0200, Miklos Szeredi wrote: >> On Thu, May 19, 2016 at 11:05 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: >> > On Thu 19-05-16 10:20:13, Miklos Szeredi wrote: >> >> Has anyone thought about sharing pages between multiple files? >> >> >> >> The obvious application is for COW filesytems where there are >> >> logically distinct files that physically share data and could easily >> >> share the cache as well if there was infrastructure for it. >> > >> > FYI this has been discussed at LSFMM this year[1]. I wasn't at the >> > session so cannot tell you any details but the LWN article covers it at >> > least briefly. >> >> Cool, so it's not such a crazy idea. > > Oh, it most certainly is crazy. :P > >> Darrick, would you mind briefly sharing your ideas regarding this? > > The current line of though is that we'll only attempt this in XFS on > inodes that are known to share underlying physical extents. i.e. > files that have blocks that have been reflinked or deduped. That > way we can overload the breaking of reflink blocks (via copy on > write) with unsharing the pages in the page cache for that inode. > i.e. shared pages can propagate upwards in overlay if it uses > reflink for copy-up and writes will then break the sharing with the > underlying source without overlay having to do anything special. > > Right now I'm not sure what mechanism we will use - we want to > support files that have a mix of private and shared pages, so that > implies we are not going to be sharing mappings but sharing pages > instead. However, we've been looking at this as being completely > encapsulated within the filesystem because it's tightly linked to > changes in the physical layout of the filesystem, not as general > "share this mapping between two unrelated inodes" infrastructure. > That may change as we dig deeper into it... > >> The use case I have is fixing overlayfs weird behavior. The following >> may result in "buf" not matching "data": >> >> int fr = open("foo", O_RDONLY); >> int fw = open("foo", O_RDWR); >> write(fw, data, sizeof(data)); >> read(fr, buf, sizeof(data)); >> >> The reason is that "foo" is on a read-only layer, and opening it for >> read-write triggers copy-up into a read-write layer. However the old, >> read-only open still refers to the unmodified file. >> >> Fixing this properly requires that when opening a file, we don't >> delegate operations fully to the underlying file, but rather allow >> sharing of pages from underlying file until the file is copied up. At >> that point we switch to sharing pages with the read-write copy. > > Unless I'm missing something here (quite possible!), I'm not sure > we can fix that problem with page cache sharing or reflink. It > implies we are sharing pages in a downwards direction - private > overlay pages/mappings from multiple inodes would need to be shared > with a single underlying shared read-only inode, and I lack the > imagination to see how that works... Indeed, reflink doesn't make this work. We could reflink-up on any open (or on lookup), not just on write, it's a trivial change in overlayfs. Drawback is slower first open/lookup and space used by duplicate trees even without modification on the overlay. Not sure if that's a problem in practice. I'll think about the generic downwards sharing. For overlayfs it doesn't need to be per-page, so that might make it somewhat simpler problem. Thanks, Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>