Re: [PATCH v2 6/9] mm: free user PTE page table pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.09.21 17:32, Jason Gunthorpe wrote:
On Wed, Sep 01, 2021 at 03:57:09PM +0200, David Hildenbrand wrote:
On 01.09.21 15:53, Jason Gunthorpe wrote:
On Thu, Aug 19, 2021 at 11:18:55AM +0800, Qi Zheng wrote:

diff --git a/mm/gup.c b/mm/gup.c
index 2630ed1bb4f4..30757f3b176c 100644
+++ b/mm/gup.c
@@ -500,6 +500,9 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
   	if (unlikely(pmd_bad(*pmd)))
   		return no_page_table(vma, flags);
+	if (!pte_try_get(mm, pmd))
+		return no_page_table(vma, flags);
+
   	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);

This is not good on a performance path, the pte_try_get() is
locking/locking the same lock that pte_offset_map_lock() is getting.

Yes, and we really need patch #8, anything else is just confusing reviewers.

It is a bit better with patch 8, but it is still not optimal, we don't
need to do the atomic work at all if the entire ptep is accessed while
locked. So the above is stil not what I would expect here, even with
RCU.

eg I would expect that this kind of change would work first with the
existing paired acessors, ie

	pte = pte_offset_map(pmd, address);
	pte_unmap(pte);

Should handle the refcount under the covers, and same kind of idea for
the _locked/_unlocked varient.

See my other mail.


Only places that don't already use that pairing should get modified.

To do this we have to extend the API so that pte_offset_map() can
fail, or very cleverly return some kind of global non-present pte page
(I wonder if the zero page would work?)

I explored both ideas (returning NULL, return a specially prepared page) and it didn't work in some cases where we unmap+remap etc.


Also, I don't really understand how this scheme works with
get_user_pages_fast.

With the RCU change it in #8 it should work just fine, because RCU
synchronize has to wait either until all other CPUs have left the RCU read
section, or re-enabled interrupts.

So at this point in the series fast gup is broken, that does mean the
series presentation really needs to be reworked. The better
presentation is to add the API changes, with a
no-functional-difference implementation, push the new API in well
split patches to all the consumption sites, then change the API to
have the new semantics.

Exactly my thoughts.


RCU and refcount to free the page levels seems like a reasonable
approach, but I have to say I haven't thought it through fully - are
all the contexts that have the pte deref safe to do call_rcu?


Very good question. I'd assume so.


--
Thanks,

David / dhildenb




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux