Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section

Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> · Thu, 22 Mar 2018 17:18:55 +0100

On 22/03/2018 17:05, Matthew Wilcox wrote:
> On Thu, Mar 22, 2018 at 04:54:52PM +0100, Laurent Dufour wrote:
>> On 22/03/2018 16:40, Matthew Wilcox wrote:
>>> On Thu, Mar 22, 2018 at 04:32:00PM +0100, Laurent Dufour wrote:
>>>> Regarding the page fault, why not relying on the PTE locking ?
>>>>
>>>> When munmap() will unset the PTE it will have to held the PTE lock, so this
>>>> will serialize the access.
>>>> If the page fault occurs before the mmap(MAP_FIXED), the page mapped will be
>>>> removed when mmap(MAP_FIXED) would do the cleanup. Fair enough.
>>>
>>> The page fault handler will walk the VMA tree to find the correct
>>> VMA and then find that the VMA is marked as deleted.  If it assumes
>>> that the VMA has been deleted because of munmap(), then it can raise
>>> SIGSEGV immediately.  But if the VMA is marked as deleted because of
>>> mmap(MAP_FIXED), it must wait until the new VMA is in place.
>>
>> I'm wondering if such a complexity is required.
>> If the user space process try to access the page being overwritten through
>> mmap(MAP_FIXED) by another thread, there is no guarantee that it will
>> manipulate the *old* page or *new* one.
> 
> Right; but it must return one or the other, it can't segfault.

Good point, I missed that...

> 
>> I'd think this is up to the user process to handle that concurrency.
>> What needs to be guaranteed is that once mmap(MAP_FIXED) returns the old page
>> are no more there, which is done through the mmap_sem and PTE locking.
> 
> Yes, and allowing the fault handler to return the *old* page risks the
> old page being reinserted into the page tables after the unmapping task
> has done its work.

The PTE locking should prevent that.

> It's *really* rare to page-fault on a VMA which is in the middle of
> being replaced.  Why are you trying to optimise it?

I was not trying to optimize it, but to not wait in the page fault handler.
This could become tricky in the case the VMA is removed once mmap(MAP_FIXED) is
done and before the waiting page fault got woken up. This means that the
removed VMA structure will have to remain until all the waiters are woken up
which implies ref_count or similar.

> 
>>> I think I was wrong to describe VMAs as being *deleted*.  I think we
>>> instead need the concept of a *locked* VMA that page faults will block on.
>>> Conceptually, it's a per-VMA rwsem, but I'd use a completion instead of
>>> an rwsem since the only reason to write-lock the VMA is because it is
>>> being deleted.
>>
>> Such a lock would only makes sense in the case of mmap(MAP_FIXED) since when
>> the VMA is removed there is no need to wait. Isn't it ?
> 
> I can't think of another reason.  I suppose we could mark the VMA as
> locked-for-deletion or locked-for-replacement and have the SIGSEGV happen
> early.  But I'm not sure that optimising for SIGSEGVs is a worthwhile
> use of our time.  Just always have the pagefault sleep for a deleted VMA.