On Fri, 10 Apr 2020 08:25:18 +0800 > Greetings, > > 0day kernel testing robot got the below dmesg and the first bad commit is > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > commit f45ec5ff16a75f96dac8c89862d75f1d8739efd4 > Author: Peter Xu <peterx@xxxxxxxxxx> > AuthorDate: Mon Apr 6 20:06:01 2020 -0700 > Commit: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > CommitDate: Tue Apr 7 10:43:39 2020 -0700 > > userfaultfd: wp: support swap and page migration > > For either swap and page migration, we all use the bit 2 of the entry to > identify whether this entry is uffd write-protected. It plays a similar > role as the existing soft dirty bit in swap entries but only for keeping > the uffd-wp tracking for a specific PTE/PMD. > > Something special here is that when we want to recover the uffd-wp bit > from a swap/migration entry to the PTE bit we'll also need to take care of > the _PAGE_RW bit and make sure it's cleared, otherwise even with the > _PAGE_UFFD_WP bit we can't trap it at all. > > In change_pte_range() we do nothing for uffd if the PTE is a swap entry. > That can lead to data mismatch if the page that we are going to write > protect is swapped out when sending the UFFDIO_WRITEPROTECT. This patch > also applies/removes the uffd-wp bit even for the swap entries. > Have trouble understanding the last sentence in the paragraph above and particularly linking it to the first one. > If you fix the issue, kindly add following tag > Reported-by: kernel test robot <lkp@xxxxxxxxx> > > [child3:925] eventfd (323) returned ENOSYS, marking as inactive. > [ 132.014801] can: request_module (can-proto-2) failed. > [ 132.063717] can: request_module (can-proto-2) failed. > [ 137.186037] trinity-c2 (943) used greatest stack depth: 5804 bytes left > [ 140.771486] MCE: Killing trinity-c2:956 due to hardware memory corruption fault at 8bd2060 > [ 140.777858] BUG: Bad rss-counter state mm:b278fc66 type:MM_ANONPAGES val:1 > [ 140.778736] BUG: Bad rss-counter state mm:b278fc66 type:MM_SHMEMPAGES val:2 > [ 141.589424] MCE: Killing trinity-c3:940 due to hardware memory corruption fault at 8a8c860 > [ 141.590730] swap_info_get: Bad swap file entry 700b8216 > [ 141.591400] BUG: Bad page map in process trinity-c3 pte:17042c3c pmd:b1809067 > [ 141.592304] addr:08bcf000 vm_flags:00100073 anon_vma:f1f29528 mapping:00000000 index:8bcf > [ 141.593399] file:(null) fault:0x0 mmap:0x0 readpage:0x0 > [ 141.594065] CPU: 0 PID: 940 Comm: trinity-c3 Not tainted 5.6.0-11490-gf45ec5ff16a75 #1 > [ 141.595055] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 > [ 141.596093] Call Trace: > [ 141.596443] dump_stack+0x16/0x18 > [ 141.596868] print_bad_pte+0x13f/0x159 > [ 141.597367] unmap_page_range+0x2a7/0x3e7 > [ 141.597893] unmap_single_vma+0x53/0x5d > [ 141.598383] unmap_vmas+0x2c/0x3b > [ 141.598811] exit_mmap+0x81/0xfc > [ 141.599238] __mmput+0x25/0x8d > [ 141.599633] mmput+0x28/0x2b > [ 141.600007] do_exit+0x2f0/0x84a > [ 141.600449] ? ___might_sleep+0x3f/0x11f > [ 141.600949] do_group_exit+0x86/0x86 > [ 141.601421] __ia32_sys_exit_group+0x15/0x15 > [ 141.601965] do_fast_syscall_32+0x86/0xbf > [ 141.602481] entry_SYSENTER_32+0xaf/0x101 Because is_swap_pte(oldpte) != IS_ENABLED(CONFIG_MIGRATION)), restore the old behavior by modifying uffd_wp only for pte that is __not__ swap entry, as the commit log says. --- b/mm/mprotect.c +++ c/mm/mprotect.c @@ -139,7 +139,7 @@ static unsigned long change_pte_range(st } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++; - } else if (is_swap_pte(oldpte)) { + } else if (IS_ENABLED(CONFIG_MIGRATION)) { swp_entry_t entry = pte_to_swp_entry(oldpte); pte_t newpte; @@ -154,7 +154,9 @@ static unsigned long change_pte_range(st newpte = pte_swp_mksoft_dirty(newpte); if (pte_swp_uffd_wp(oldpte)) newpte = pte_swp_mkuffd_wp(newpte); - } else if (is_write_device_private_entry(entry)) { + } + + if (is_write_device_private_entry(entry)) { /* * We do not preserve soft-dirtiness. See * copy_one_pte() for explanation. @@ -163,11 +165,18 @@ static unsigned long change_pte_range(st newpte = swp_entry_to_pte(entry); if (pte_swp_uffd_wp(oldpte)) newpte = pte_swp_mkuffd_wp(newpte); - } else { - newpte = oldpte; } - if (uffd_wp) + /* + * do nothing for changing uffd_wp if oldpte is a + * swap entry. + * That can lead to data mismatch if the page we + * are going to write protect is swapped out when + * sending the UFFDIO_WRITEPROTECT. + */ + if (is_swap_pte(oldpte)) + newpte = oldpte; + else if (uffd_wp) newpte = pte_swp_mkuffd_wp(newpte); else if (uffd_wp_resolve) newpte = pte_swp_clear_uffd_wp(newpte);