On Tue, Mar 31, 2020 at 7:08 AM Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote: > > On Mon, Mar 30, 2020 at 11:50:41AM -0700, Yang Shi wrote: > > On Sat, Mar 28, 2020 at 5:33 AM Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote: > > > > > > On Fri, Mar 27, 2020 at 09:17:00PM -0400, Zi Yan wrote: > > > > > The compound page may be locked here if the function called for the first > > > > > time for the page and not locked after that (becouse we've unlocked it we > > > > > saw it the first time). The same with LRU. > > > > > > > > > > > > > For the first time, the compound page is locked and not on LRU, so this VM_BUG_ON passes. > > > > For the second time and so on, the compound page is unlocked and on the LRU, > > > > so this VM_BUG_ON still passes. > > > > > > > > For base page, VM_BUG_ON passes. > > > > > > > > Other unexpected situation (a compound page is locked and on LRU) triggers the VM_BU_ON, > > > > but your VM_BUG_ON will not detect this situation, right? > > > > > > Right. I will rework this code. I've just realized it is racy: after > > > unlock and putback on LRU the page can be locked by somebody else and this > > > code can unlock it which completely borken. > > > > > > I'll pass down compound_pagelist to release_pte_pages() and handle the > > > situation there. > > > > > > > >>> if (likely(writable)) { > > > > >>> if (likely(referenced)) { > > > > >> > > > > >> Do we need a list here? There should be at most one compound page we will see here, right? > > > > > > > > > > Um? It's outside the pte loop. We get here once per PMD range. > > > > > > > > > > 'page' argument to trace_mm_collapse_huge_page_isolate() is misleading: > > > > > it's just the last page handled in the loop. > > > > > > > > > > > > > Throughout the pte loop, we should only see at most one compound page, right? > > > > > > No. mremap(2) opens a possibility for HPAGE_PMD_NR compound pages for > > > single PMD range. > > > > Do you mean every PTE in the PMD is mapped by a sub page from different THPs? > > Yes. > > Well, it was harder to archive than I expected, but below is a test case, > I've come up with. It maps 512 head pages within single PMD range. Thanks, this is very helpful. > > diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c > index 3a98d5b2d6d8..9ae119234a39 100644 > --- a/tools/testing/selftests/vm/khugepaged.c > +++ b/tools/testing/selftests/vm/khugepaged.c > @@ -703,6 +703,63 @@ static void collapse_full_of_compound(void) > munmap(p, hpage_pmd_size); > } > > +static void collapse_compound_extreme(void) > +{ > + void *p; > + int i; > + > + p = alloc_mapping(); > + for (i = 0; i < hpage_pmd_nr; i++) { > + printf("\rConstruct PTE page table full of different PTE-mapped compound pages %3d/%d...", > + i + 1, hpage_pmd_nr); > + > + madvise(BASE_ADDR, hpage_pmd_size, MADV_HUGEPAGE); > + fill_memory(BASE_ADDR, 0, hpage_pmd_size); > + if (!check_huge(BASE_ADDR)) { > + printf("Failed to allocate huge page\n"); > + exit(EXIT_FAILURE); > + } > + madvise(BASE_ADDR, hpage_pmd_size, MADV_NOHUGEPAGE); > + > + p = mremap(BASE_ADDR - i * page_size, > + i * page_size + hpage_pmd_size, > + (i + 1) * page_size, > + MREMAP_MAYMOVE | MREMAP_FIXED, > + BASE_ADDR + 2 * hpage_pmd_size); > + if (p == MAP_FAILED) { > + perror("mremap+unmap"); > + exit(EXIT_FAILURE); > + } > + > + p = mremap(BASE_ADDR + 2 * hpage_pmd_size, > + (i + 1) * page_size, > + (i + 1) * page_size + hpage_pmd_size, > + MREMAP_MAYMOVE | MREMAP_FIXED, > + BASE_ADDR - (i + 1) * page_size); > + if (p == MAP_FAILED) { > + perror("mremap+alloc"); > + exit(EXIT_FAILURE); > + } > + } > + > + munmap(BASE_ADDR, hpage_pmd_size); > + fill_memory(p, 0, hpage_pmd_size); > + if (!check_huge(p)) > + success("OK"); > + else > + fail("Fail"); > + > + if (wait_for_scan("Collapse PTE table full of different compound pages", p)) > + fail("Timeout"); > + else if (check_huge(p)) > + success("OK"); > + else > + fail("Fail"); > + > + validate_memory(p, 0, hpage_pmd_size); > + munmap(p, hpage_pmd_size); > +} > + > static void collapse_fork(void) > { > int wstatus; > @@ -916,6 +973,7 @@ int main(void) > collapse_max_ptes_swap(); > collapse_signle_pte_entry_compound(); > collapse_full_of_compound(); > + collapse_compound_extreme(); > collapse_fork(); > collapse_fork_compound(); > collapse_max_ptes_shared(); > -- > Kirill A. Shutemov