Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping

Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> · Mon, 2 Jul 2018 17:01:12 -0700

On 6/29/18 9:26 PM, Yang Shi wrote:

On 6/29/18 8:15 PM, Andrew Morton wrote:
On Fri, 29 Jun 2018 19:28:15 -0700 Yang Shi 
<yang.shi@xxxxxxxxxxxxxxxxx> wrote:

we're adding a bunch of code to 32-bit kernels which will never be
executed.

I'm thinking it would be better to be much more explicit with "#ifdef
CONFIG_64BIT" in this code, rather than relying upon the above magic.

But I tend to think that the fact that we haven't solved anything on
locked vmas or on uprobed mappings is a shostopper for the whole
approach :(
I agree it is not that perfect. But, it still could improve the most 
use
cases.
Well, those unaddressed usecases will need to be fixed at some point.

Yes, definitely.

What's our plan for that?

As I mentioned in the earlier email, locked and hugetlb cases might be 
able to be solved by separating vm_flags update and actual unmap. I 
will look into it further later.

By looking into this further， I think both mlocked and hugetlb vmas can 
be handled.

For mlocked vmas, it is easy since we acquires write mmap_sem before 
unmapping, so VM_LOCK flags can be cleared here then unmap, just like 
what the regular path does.

For hugetlb vmas, the VM_MAYSHARE flag is just checked by 
huge_pmd_share() in hugetlb_fault()->huge_pte_alloc(), another call site 
is dup_mm()->copy_page_range()->copy_hugetlb_page_range(), we don't care 
this call chain in this case.

So we may expand VM_DEAD to hugetlb_fault().  Michal suggested to check 
VM_DEAD in check_stable_address_space(), so it would be called in 
hugetlb_fault() too (not in current code), then the page fault handler 
would bail out before huge_pte_alloc() is called.

With this trick, we don't have to care about when the vm_flags is 
updated, we can unmap hugetlb vmas in read mmap_sem critical section, 
then update the vm_flags with write mmap_sem held or before the unmap.

Yang

From my point of view, uprobe mapping sounds not that vital.

Would one of your earlier designs have addressed all usecases? I
expect the dumb unmap-a-little-bit-at-a-time approach would have?

Yes. The v1 design does unmap with holding write map_sem. So, the 
vm_flags update is not a problem.

Thanks,
Yang

For the locked vmas and hugetlb vmas, unmapping operations need modify
vm_flags. But, I'm wondering we might be able to separate unmap and
vm_flags update. Because we know they will be unmapped right away, the
vm_flags might be able to be updated in write mmap_sem critical section
before the actual unmap is called or after it. This is just off the top
of my head.

For uprobed mappings, I'm not sure how vital it is to this case.

Thanks,
Yang