On Wed, Sep 12, 2018 at 02:49:21PM +0800, Peter Xu wrote: > Add an extra check on page dirty bit in change_pte_range() since there > might be case where PTE dirty bit is unset but it's actually dirtied. > One example is when a huge PMD is splitted after written: the dirty bit > will be set on the compound page however we won't have the dirty bit set > on each of the small page PTEs. > > I noticed this when debugging with a customized kernel that implemented > userfaultfd write-protect. In that case, the dirty bit will be critical > since that's required for userspace to handle the write protect page > fault (otherwise it'll get a SIGBUS with a loop of page faults). > However it should still be good even for upstream Linux to cover more > scenarios where we shouldn't need to do extra page faults on the small > pages if the previous huge page is already written, so the dirty bit > optimization path underneath can cover more. > So as said by Kirill NAK you are not looking at the right place for your bug please first apply the below patch and read my analysis in my last reply. Below patch fix userfaultfd bug. I am not posting it as it is on a branch and i am not sure when Andrea plan to post. Andrea feel free to squash that fix. >From 35cdb30afa86424c2b9f23c0982afa6731be961c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@xxxxxxxxxx> Date: Wed, 12 Sep 2018 08:58:33 -0400 Subject: [PATCH] userfaultfd: do not set dirty accountable when changing protection MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit mwriteprotect_range() has nothing to do with the dirty accountable optimization so do not set it as it opens a door for userspace to unwrite protect pages in a range that is write protected ie the vma !(vm_flags & VM_WRITE). Signed-off-by: Jérôme Glisse <jglisse@xxxxxxxxxx> --- mm/userfaultfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index a0379c5ffa7c..59db1ce48fa0 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -632,7 +632,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, newprot = vm_get_page_prot(dst_vma->vm_flags); change_protection(dst_vma, start, start + len, newprot, - !enable_wp, 0); + false, 0); err = 0; out_unlock: -- 2.17.1