On Mon, Mar 11, 2024 at 09:01:16AM +0000, Ryan Roberts wrote: > [ 153.499149] Call trace: > [ 153.499470] uncharge_folio+0x1d0/0x2c8 > [ 153.500045] __mem_cgroup_uncharge_folios+0x5c/0xb0 > [ 153.500795] move_folios_to_lru+0x5bc/0x5e0 > [ 153.501275] shrink_lruvec+0x5f8/0xb30 > And that code is from your commit 29f3843026cf ("mm: free folios directly in move_folios_to_lru()") which is another patch in the same series. This suffers from the same problem; uncharge before removing folio from deferred list, so using wrong lock - there are 2 sites in this function that does this. Two sites, but basically the same thing; one is for "the batch is full" and the other is "we finished the list". diff --git a/mm/vmscan.c b/mm/vmscan.c index a0e53999a865..f60c5b3977dc 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1842,6 +1842,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); + if (folio_test_large(folio) && + folio_test_large_rmappable(folio)) + folio_undo_large_rmappable(folio); if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); > A quick grep over the entire series has a lot of hits for "uncharge". I > wonder if we need a full audit of that series for other places that > could potentially be doing the same thing? I think this assertion will catch all occurrences of the same thing, as long as people who are testing are testing in a memcg. My setup doesn't use a memcg, so I never saw any of this ;-( If you confirm this fixes it, I'll send two patches; a respin of the patch I sent on Sunday that calls undo_large_rmappable in this one extra place, and then a patch to add the assertions.