On Mon, Mar 30, 2020 at 11:50:41AM -0700, Yang Shi wrote: > On Sat, Mar 28, 2020 at 5:33 AM Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote: > > > > On Fri, Mar 27, 2020 at 09:17:00PM -0400, Zi Yan wrote: > > > > The compound page may be locked here if the function called for the first > > > > time for the page and not locked after that (becouse we've unlocked it we > > > > saw it the first time). The same with LRU. > > > > > > > > > > For the first time, the compound page is locked and not on LRU, so this VM_BUG_ON passes. > > > For the second time and so on, the compound page is unlocked and on the LRU, > > > so this VM_BUG_ON still passes. > > > > > > For base page, VM_BUG_ON passes. > > > > > > Other unexpected situation (a compound page is locked and on LRU) triggers the VM_BU_ON, > > > but your VM_BUG_ON will not detect this situation, right? > > > > Right. I will rework this code. I've just realized it is racy: after > > unlock and putback on LRU the page can be locked by somebody else and this > > code can unlock it which completely borken. > > > > I'll pass down compound_pagelist to release_pte_pages() and handle the > > situation there. > > > > > >>> if (likely(writable)) { > > > >>> if (likely(referenced)) { > > > >> > > > >> Do we need a list here? There should be at most one compound page we will see here, right? > > > > > > > > Um? It's outside the pte loop. We get here once per PMD range. > > > > > > > > 'page' argument to trace_mm_collapse_huge_page_isolate() is misleading: > > > > it's just the last page handled in the loop. > > > > > > > > > > Throughout the pte loop, we should only see at most one compound page, right? > > > > No. mremap(2) opens a possibility for HPAGE_PMD_NR compound pages for > > single PMD range. > > Do you mean every PTE in the PMD is mapped by a sub page from different THPs? Yes. Well, it was harder to archive than I expected, but below is a test case, I've come up with. It maps 512 head pages within single PMD range. diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 3a98d5b2d6d8..9ae119234a39 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -703,6 +703,63 @@ static void collapse_full_of_compound(void) munmap(p, hpage_pmd_size); } +static void collapse_compound_extreme(void) +{ + void *p; + int i; + + p = alloc_mapping(); + for (i = 0; i < hpage_pmd_nr; i++) { + printf("\rConstruct PTE page table full of different PTE-mapped compound pages %3d/%d...", + i + 1, hpage_pmd_nr); + + madvise(BASE_ADDR, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(BASE_ADDR, 0, hpage_pmd_size); + if (!check_huge(BASE_ADDR)) { + printf("Failed to allocate huge page\n"); + exit(EXIT_FAILURE); + } + madvise(BASE_ADDR, hpage_pmd_size, MADV_NOHUGEPAGE); + + p = mremap(BASE_ADDR - i * page_size, + i * page_size + hpage_pmd_size, + (i + 1) * page_size, + MREMAP_MAYMOVE | MREMAP_FIXED, + BASE_ADDR + 2 * hpage_pmd_size); + if (p == MAP_FAILED) { + perror("mremap+unmap"); + exit(EXIT_FAILURE); + } + + p = mremap(BASE_ADDR + 2 * hpage_pmd_size, + (i + 1) * page_size, + (i + 1) * page_size + hpage_pmd_size, + MREMAP_MAYMOVE | MREMAP_FIXED, + BASE_ADDR - (i + 1) * page_size); + if (p == MAP_FAILED) { + perror("mremap+alloc"); + exit(EXIT_FAILURE); + } + } + + munmap(BASE_ADDR, hpage_pmd_size); + fill_memory(p, 0, hpage_pmd_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + if (wait_for_scan("Collapse PTE table full of different compound pages", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + static void collapse_fork(void) { int wstatus; @@ -916,6 +973,7 @@ int main(void) collapse_max_ptes_swap(); collapse_signle_pte_entry_compound(); collapse_full_of_compound(); + collapse_compound_extreme(); collapse_fork(); collapse_fork_compound(); collapse_max_ptes_shared(); -- Kirill A. Shutemov