Re: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks

"Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> · Sat, 5 Oct 2019 14:05:29 +0530

On 10/3/19 5:21 PM, Peter Zijlstra wrote:
On Thu, Oct 03, 2019 at 09:11:45AM +0200, Peter Zijlstra wrote:
On Wed, Oct 02, 2019 at 10:33:15PM -0300, Leonardo Bras wrote:

....

And I still think all that wrong, you really shouldn't need to wait on
munmap().

I do have a patch that does something like that.

+#define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL
+static inline pmd_t pmdp_huge_get_and_clear_full(struct mm_struct *mm,
+						 unsigned long address, pmd_t *pmdp,
+						 int full)
+{
+	bool serialize = true;
+	/*
+	 * We don't need to serialze against a lockless page table walk if
+	 * we are clearing the pmd due to task exit. For regular mnumap, we
+	 * still need to serialize due the possibility of MADV_DONTNEED running
+	 * parallel to a page fault which can convert a THP pte entry to a
+	 * pointer to level 4 table.
+	 * Here MADV_DONTNEED is removing the THP entry and the fault is filling
+	 * a level 4 pte.
+	 */
+	if (full == 1)
+		serialize = false;
+	return __pmdp_huge_get_and_clear(mm, address, pmdp, serialize);
 }

if it is a fullmm flush we can skip that serialize, But for everything 
else we need to serialize. MADV_DONTNEED is another case. I haven't sent 
this yet, because I was trying to look at what it takes to switch that 
MADV variant to take mmap_sem in write mode.

MADV_DONTNEED has caused us multiple issues due to the fact that it can 
run in parallel to page fault. I am not sure whether we have a 
known/noticeable performance gain in allowing that with mmap_sem held in 
read mode.

-aneesh