On Tue, 11 Sep 2018 at 12:34, Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: > > A transparent huge page is represented by a single entry on an LRU list. > Therefore, we can only make unevictable an entire compound page, not > individual subpages. > > If a user tries to mlock() part of a huge page, we want the rest of the > page to be reclaimable. > > We handle this by keeping PTE-mapped huge pages on normal LRU lists: the > PMD on border of VM_LOCKED VMA will be split into PTE table. > > Introduction of THP migration breaks the rules around mlocking THP > pages. If we had a single PMD mapping of the page in mlocked VMA, the > page will get mlocked, regardless of PTE mappings of the page. > > For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in > remove_migration_pmd(). > > Anon THP pages can only be shared between processes via fork(). Mlocked > page can only be shared if parent mlocked it before forking, otherwise > CoW will be triggered on mlock(). > > For Anon-THP, we can fix the issue by munlocking the page on removing PTE > migration entry for the page. PTEs for the page will always come after > mlocked PMD: rmap walks VMAs from oldest to newest. > > Test-case: > > #include <unistd.h> > #include <sys/mman.h> > #include <sys/wait.h> > #include <linux/mempolicy.h> > #include <numaif.h> > > int main(void) > { > unsigned long nodemask = 4; > void *addr; > > addr = mmap((void *)0x20000000UL, 2UL << 20, PROT_READ | PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0); > > if (fork()) { > wait(NULL); > return 0; > } > > mlock(addr, 4UL << 10); > mbind(addr, 2UL << 20, MPOL_PREFERRED | MPOL_F_RELATIVE_NODES, > &nodemask, 4, MPOL_MF_MOVE | MPOL_MF_MOVE_ALL); MPOL_MF_MOVE_ALL is actually not required to trigger the bug. > > return 0; > } > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > Reported-by: Vegard Nossum <vegard.nossum@xxxxxxxxx> Would you mind putting vegard.nossum@xxxxxxxxxx instead? > Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") The commit I bisected the problem to was actually a different one: commit c8633798497ce894c22ab083eb884c8294c537b2 Author: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Date: Fri Sep 8 16:11:08 2017 -0700 mm: mempolicy: mbind and migrate_pages support thp migration But maybe you had a good reason to choose the other one instead. They are close together in any case, so I guess it would be hard to find a kernel with one commit and not the other. > Cc: <stable@xxxxxxxxxxxxxxx> [v4.14+] > Cc: Zi Yan <zi.yan@xxxxxxxxxxxxxx> > Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> You could also add: Link: https://lkml.org/lkml/2018/8/30/464 Thanks for debugging this. Vegard