Re: Avoiding allocation of unused shmem page

David Hildenbrand <david@xxxxxxxxxx> · Fri, 21 Oct 2022 16:10:41 +0200

On 21.10.22 16:01, Peter Xu wrote:
On Fri, Oct 21, 2022 at 09:23:00AM +0200, David Hildenbrand wrote:
On 20.10.22 23:10, Peter Xu wrote:
On Thu, Oct 20, 2022 at 09:14:09PM +0100, Matthew Wilcox wrote:
In yesterday's call, David brought up the case where we fallocate a file
in shmem, call mmap(MAP_PRIVATE) and then store to a page which is over
a hole.  That currently causes shmem to allocate a page, zero-fill it,
then COW it, resulting in two pages being allocated when only the
COW page really needs to be allocated.

The path we currently take through the MM when we take the page fault
looks like this (correct me if I'm wrong ...):

handle_mm_fault()
__handle_mm_fault()
handle_pte_fault()
do_fault()
do_cow_fault()
__do_fault()
vm_ops->fault()

... which is where we come into shmem_fault().  Apart from the
horrendous hole-punch handling case, shmem_fault() is quite simple:

          err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
                                    gfp, vma, vmf, &ret);
          if (err)
                  return vmf_error(err);
          vmf->page = folio_file_page(folio, vmf->pgoff);
          return ret;

What we could do here is detect this case.  Something like:

	enum sgp_type sgp = SGP_CACHE;

	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
		sgp = SGP_READ;

Yes this will start to save the space, but just to mention this may start
to break anything that will still depend on the pagecache to work.  E.g.,
it'll change behavior if the vma is registered with uffd missing mode;
we'll start to lose MISSING events for these private mappings.  Not sure
whether there're other side effects.

I don't follow, can you elaborate?

hugetlb doesn't perform this kind of unnecessary allocation and should be fine in regards to uffd. Why should it matter here and how exactly would a problematic sequence look like?

Hugetlb is special because hugetlb detects pte first and relies on pte at
least for uffd.  shmem is not.

Feel free to also reference the recent fix which relies on the stable
hugetlb pte with commit 2ea7ff1e39cbe375.

Sorry to be dense here, but I don't follow how that relates.

Assume we have a MAP_PRIVATE shmem mapping and someone registers uffd 
missing events on that mapping.

Assume we get a page fault on a hole. We detect no page is mapped and 
check if the page cache has a page mapped -- which is also not the case, 
because there is a hole.

So we notify uffd.

Uffd will place a page. It should *not* touch the page cache and only 
insert that page into the page table -- otherwise we'd be violating 
MAP_PRIVATE semantics.

What am I missing?

[...]

There is an easy way to trigger this from QEMU, and we've had
customers running into this:

Can the customer simply set shared=on?

Of course they can. It rather comes with a surprise for them, because -- 
for now -- we're not even warning that this most probably doesn't make 
too much sense.

--
Thanks,

David / dhildenb