Re: [LSF/MM TOPIC] Sharing file backed pages

Jerome Glisse <jglisse@xxxxxxxxxx> · Wed, 23 Jan 2019 10:12:29 -0500

On Wed, Jan 23, 2019 at 03:54:34PM +0100, Jan Kara wrote:
> On Wed 23-01-19 10:48:58, Amir Goldstein wrote:
> > In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> > up the subject of sharing pages between cloned files and the general vibe
> > in room was that it could be done.
> > 
> > In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> > that Matthew Willcox was "working on that problem".
> > 
> > I have started working on a new overlayfs address space implementation
> > that could also benefit from being able to share pages even for filesystems
> > that do not support clones (for copy up anticipation state).
> > 
> > To simplify the problem, we can start with sharing only uptodate clean
> > pages that map the same offset in respected files. While the same offset
> > requirement somewhat limits the use cases that benefit from shared file
> > pages, there is still a vast majority of use cases (i.e. clone full
> > image), where sharing pages of similar offset will bring a lot of
> > benefit.
> > 
> > At first glance, this requires dropping the assumption that a for an
> > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > Is there really such an assumption in common vfs/mm code?  and what will
> > it take to drop it?
> 
> There definitely is such assumption. Take for example page reclaim as one
> such place that will be non-trivial to deal with. You need to remove the
> page from page cache of all inodes that contain it without having any file
> context whatsoever. So you will need to create some way for this page->page
> caches mapping to happen. Jerome in his talk at LSF/MM last year [1] actually
> nicely summarized what it would take to get rid of page->mapping
> dereferences. He even had some preliminary patches. To sum it up, it's a
> lot of intrusive work but in principle it is possible.
> 
> [1] https://lwn.net/Articles/752564/
> 

I intend to post a v2 of my patchset doing that sometime soon. For
various reasons this had been push to the bottom of my todo list since
last year. It is now almost at the top and it will stay at the top.
So i will be resuming work on that.

I wanted to propose this topic again as a joint session with mm so
here is my proposal:

I would like to discuss the removal of page mapping field dependency
in most kernel code path so the we can overload that field for generic
page write protection (KSM) for file back pages. The whole idea behind
this is that we almost always have the mapping a page belongs to within
the call stack for any function that operate on a file or on a vma do
have it:
    - syscall/kernel on a file (file -> inode -> mapping)
    - syscall/kernel on virtual address (vma -> file -> mapping)
    - write back for a given mapping

Note that the plan is not to free up the mapping field in struct page
but to reduce the number of place that needs the mapping corresponding
to a page to as few places as possible. The few exceptions are:
    - page reclaim
    - memory compaction
    - set_page_dirty() on GUPed (get_user_pages*()) pages

For page reclaim and memory compaction we do not care about mapping
exactly but about being able to unmap/migrate a page. So any over-
loading of mapping needs to keep providing helpers to handle those
cases.

For set_page_dirty() on GUPed pages we can take a slow path if the
page has an overloaded mapping field.

Previous patchset:
https://lore.kernel.org/lkml/20180404191831.5378-1-jglisse@xxxxxxxxxx/

Cheers,
Jérôme