When splitting a huge page, we should set all small pages as dirty if the original huge page has the dirty bit set before. Otherwise we'll lose the original dirty bit. CC: Andrea Arcangeli <aarcange@xxxxxxxxxx> CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> CC: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> CC: Michal Hocko <mhocko@xxxxxxxx> CC: Zi Yan <zi.yan@xxxxxxxxxxxxxx> CC: Huang Ying <ying.huang@xxxxxxxxx> CC: Dan Williams <dan.j.williams@xxxxxxxxx> CC: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> CC: "Jérôme Glisse" <jglisse@xxxxxxxxxx> CC: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> CC: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> CC: Souptick Joarder <jrdr.linux@xxxxxxxxx> CC: linux-mm@xxxxxxxxx CC: linux-kernel@xxxxxxxxxxxxxxx Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> --- To the reviewers: I'm new to the mm world so sorry if this patch is making silly mistakes, however it did solve a problem for me when testing with a customized Linux tree mostly based on Andrea's userfault write-protect work. Without the change, my customized QEMU/tcg tree will not be able to do correct UFFDIO_WRITEPROTECT and then QEMU will get a SIGBUS when faulting multiple times. With the change (or of course disabling THP) then UFFDIO_WRITEPROTECT will be able to correctly resolve the write protections then it runs well. Any comment would be welcomed. TIA. --- mm/huge_memory.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c3bc7e9c9a2a..0754a16923d5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2176,6 +2176,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = pte_mkold(entry); if (soft_dirty) entry = pte_mksoft_dirty(entry); + if (dirty) + entry = pte_mkdirty(entry); } pte = pte_offset_map(&_pmd, addr); BUG_ON(!pte_none(*pte)); -- 2.17.1