On Thu, Mar 07, 2024 at 09:50:09PM +0800, Yin, Fengwei wrote: > > > On 3/7/2024 4:56 PM, wrote: > > I just want to make sure I've understood correctly: CPU1's folio_put() > > is not the last reference, and it keeps iterating through the local > > list. Then CPU2 does the final folio_put() which causes list_del_init() > > to modify the local list concurrently with CPU1's iteration, so CPU1 > > probably goes into the weeds? > > My understanding is this can not corrupt the folio->deferred_list as > this folio was iterated already. I am not convinced about that at all. It's possible this isn't the only problem, but deleting something from a list without holding (the correct) lock is something you have to think incredibly hard about to get right. I didn't bother going any deeper into the analysis once I spotted the locking problem, but the proof is very much on you that this is not a bug! > But I did see other strange thing: > [ 76.269942] page: refcount:0 mapcount:1 mapping:0000000000000000 > index:0xffffbd0a0 pfn:0x2554a0 > [ 76.270483] note: kcompactd0[62] exited with preempt_count 1 > [ 76.271344] head: order:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 > > This large folio has order 0? Maybe folio->_flags_1 was screwed? > > In free_unref_folios(), there is code like following: > if (order > 0 && folio_test_large_rmappable(folio)) > folio_undo_large_rmappable(folio); > > But with destroy_large_folio(): > if (folio_test_large_rmappable(folio)) > > folio_undo_large_rmappable(folio); > > Can it connect to the folio has zero refcount still in deferred list > with Matthew's patch? > > > Looks like folio order was cleared unexpected somewhere. No, we intentionally clear it: free_unref_folios -> free_unref_page_prepare -> free_pages_prepare -> page[1].flags &= ~PAGE_FLAGS_SECOND; PAGE_FLAGS_SECOND includes the order, which is why we have to save it away in folio->private so that we know what it is in the second loop. So it's always been cleared by the time we call free_page_is_bad().