On Tue, Dec 06, 2022 at 06:16:07PM +0100, Jann Horn wrote:
commit 2ba99c5e08812494bc57f319fb562f527d9bacd8 upstream. Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to ensure that the page table was not removed by khugepaged in between. However, lockless_pages_from_mm() still requires that the page table is not concurrently freed. Fix it by sending IPIs (if the architecture uses semi-RCU-style page table freeing) before freeing/reusing page tables. Link: https://lkml.kernel.org/r/20221129154730.2274278-2-jannh@xxxxxxxxxx Link: https://lkml.kernel.org/r/20221128180252.1684965-2-jannh@xxxxxxxxxx Link: https://lkml.kernel.org/r/20221125213714.4115729-2-jannh@xxxxxxxxxx Fixes: ba76149f47d8 ("thp: khugepaged") Signed-off-by: Jann Horn <jannh@xxxxxxxxxx> Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> Acked-by: David Hildenbrand <david@xxxxxxxxxx> Cc: John Hubbard <jhubbard@xxxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> [manual backport: two of the three places in khugepaged that can free ptes were refactored into a common helper between 5.15 and 6.0; TLB flushing was refactored between 5.4 and 5.10; TLB flushing was refactored between 4.19 and 5.4; pmd collapse for PTE-mapped THP was only added in 5.4] Signed-off-by: Jann Horn <jannh@xxxxxxxxxx>
This one actually fails on v4.19: mm/khugepaged.c: In function 'collapse_huge_page': mm/khugepaged.c:1048:9: error: implicit declaration of function 'tlb_remove_table_sync_one'; did you mean 'tlb_remove_page_size'? [-Werror=implicit-function-declaration] 1048 | tlb_remove_table_sync_one(); | ^~~~~~~~~~~~~~~~~~~~~~~~~ | tlb_remove_page_size Presumably because we don't have 9de7d833e370 ("s390/tlb: Convert to generic mmu_gather") on those kernels. I'll drop both backports from <= 4.19 kernels. -- Thanks, Sasha