On (21/01/13 20:31), Hugh Dickins wrote: > > We are running into lockups during the memory pressure tests on our > > boards, which essentially NMI panic them. In short the test case is > > > > - THP shmem > > echo advise > /sys/kernel/mm/transparent_hugepage/shmem_enabled > > > > - And a user-space process doing madvise(MADV_HUGEPAGE) on new mappings, > > and madvise(MADV_REMOVE) when it wants to remove the page range > > > > The problem boils down to the reverse locking chain: > > kswapd does > > > > lock_page(page) -> down_read(page->mapping->i_mmap_rwsem) > > > > madvise() process does > > > > down_write(page->mapping->i_mmap_rwsem) -> lock_page(page) > > > > > > > > CPU0 CPU1 > > > > kswapd vfs_fallocate() > > shrink_node() shmem_fallocate() > > shrink_active_list() unmap_mapping_range() > > page_referenced() << lock page:PG_locked >> unmap_mapping_pages() << down_write(mapping->i_mmap_rwsem) >> > > rmap_walk_file() zap_page_range_single() > > down_read(mapping->i_mmap_rwsem) << W-locked on CPU1>> unmap_page_range() > > rwsem_down_read_failed() __split_huge_pmd() > > __rwsem_down_read_failed_common() __lock_page() << PG_locked on CPU0 >> > > schedule() wait_on_page_bit_common() > > io_schedule() > > Very interesting, Sergey: many thanks for this report. Thanks for the quick feedback. > There is no doubt that kswapd is right in its lock ordering: > __split_huge_pmd() is in the wrong to be attempting lock_page(). > > Which used not to be done, but was added in 5.8's c444eb564fb1 ("mm: > thp: make the THP mapcount atomic against __split_huge_pmd_locked()"). Hugh, I forgot to mention, we are facing these issues on 4.19. Let me check if (maybe) we have cherry picked c444eb564fb1. -ss