On Fri, Sep 25, 2020 at 11:21:58AM +0800, Huang, Ying wrote: > Rafael Aquini <aquini@xxxxxxxxxx> writes: > >> Or, can you help to run the test with a debug kernel based on upstream > >> kernel. I can provide some debug patch. > >> > > > > Sure, I can set your patches to run with the test cases we have that tend to > > reproduce the issue with some degree of success. > > Thanks! > > I found a race condition. During THP splitting, "head" may be unlocked > before calling split_swap_cluster(), because head != page during > deferred splitting. So we should call split_swap_cluster() before > unlocking. The debug patch to do that is as below. Can you help to > test it? > I finally could grab a good crashdump and confirm that head is really not locked. I still need to dig into it to figure out more about the crash. I guess that your patch will guarantee that lock on head, but it still doesn't help on explaining how did we get the THP marked as PG_swapcache, given that it should fail add_to_swap()->get_swap_page() right? I'll give your patch a run over the weekend, hopefully we'll have more info on this next week. > Best Regards, > Huang, Ying > > ------------------------8<---------------------------- > From 24ce0736a9f587d2dba12f12491c88d3e296a491 Mon Sep 17 00:00:00 2001 > From: Huang Ying <ying.huang@xxxxxxxxx> > Date: Fri, 25 Sep 2020 11:10:56 +0800 > Subject: [PATCH] dbg: Call split_swap_clsuter() before unlock page during > split THP > > --- > mm/huge_memory.c | 13 +++++++------ > 1 file changed, 7 insertions(+), 6 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index faadc449cca5..8d79e5e6b46e 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2444,6 +2444,12 @@ static void __split_huge_page(struct page *page, struct list_head *list, > > remap_page(head); > > + if (PageSwapCache(head)) { > + swp_entry_t entry = { .val = page_private(head) }; > + > + split_swap_cluster(entry); > + } > + > for (i = 0; i < HPAGE_PMD_NR; i++) { > struct page *subpage = head + i; > if (subpage == page) > @@ -2678,12 +2684,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) > } > > __split_huge_page(page, list, end, flags); > - if (PageSwapCache(head)) { > - swp_entry_t entry = { .val = page_private(head) }; > - > - ret = split_swap_cluster(entry); > - } else > - ret = 0; > + ret = 0; > } else { > if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { > pr_alert("total_mapcount: %u, page_count(): %u\n", > -- > 2.28.0 >