On 11/03/2024 12:26, Matthew Wilcox wrote: > On Mon, Mar 11, 2024 at 09:01:16AM +0000, Ryan Roberts wrote: >> [ 153.499149] Call trace: >> [ 153.499470] uncharge_folio+0x1d0/0x2c8 >> [ 153.500045] __mem_cgroup_uncharge_folios+0x5c/0xb0 >> [ 153.500795] move_folios_to_lru+0x5bc/0x5e0 >> [ 153.501275] shrink_lruvec+0x5f8/0xb30 > >> And that code is from your commit 29f3843026cf ("mm: free folios directly in move_folios_to_lru()") which is another patch in the same series. This suffers from the same problem; uncharge before removing folio from deferred list, so using wrong lock - there are 2 sites in this function that does this. > > Two sites, but basically the same thing; one is for "the batch is full" > and the other is "we finished the list". > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a0e53999a865..f60c5b3977dc 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1842,6 +1842,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, > if (unlikely(folio_put_testzero(folio))) { > __folio_clear_lru_flags(folio); > > + if (folio_test_large(folio) && > + folio_test_large_rmappable(folio)) > + folio_undo_large_rmappable(folio); > if (folio_batch_add(&free_folios, folio) == 0) { > spin_unlock_irq(&lruvec->lru_lock); > mem_cgroup_uncharge_folios(&free_folios); > >> A quick grep over the entire series has a lot of hits for "uncharge". I >> wonder if we need a full audit of that series for other places that >> could potentially be doing the same thing? > > I think this assertion will catch all occurrences of the same thing, > as long as people who are testing are testing in a memcg. My setup > doesn't use a memcg, so I never saw any of this ;-( > > If you confirm this fixes it, I'll send two patches; a respin of the patch > I sent on Sunday that calls undo_large_rmappable in this one extra place, > and then a patch to add the assertions. Good timing on your response - I've just finished testing! Although my patch included both the site you have above and another that I fixed up speculatively in shrink_folio_list() based on reviewing all mem_cgroup_uncharge_folios() call sites. I haven't been able to reproduce any issue with this patch (and the extra asserts) in place. I've run ~50 iterations over ~2 hours. Previous record was about 30 iterations before catching an oops. Given how difficult it now is to repro, I can't be sure this has definitely fixed all possible places, but its looking positive. diff --git a/mm/vmscan.c b/mm/vmscan.c index a0e53999a865..cf7d4cf47f1a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1436,6 +1436,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, */ nr_reclaimed += nr_pages; + if (folio_test_large(folio) && folio_test_large_rmappable(folio)) + folio_undo_large_rmappable(folio); + if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1842,6 +1845,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); + if (folio_test_large(folio) && folio_test_large_rmappable(folio)) + folio_undo_large_rmappable(folio); + if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); There is also a call to mem_cgroup_uncharge() in delete_from_lru_cache(), which I couldn't convince myself was safe. Perhaps you could do a quick audit of the call sites? But taking a step back, I wonder whether the charge() and uncharge() functions should really be checking to see if the folio is on a deferred split list and if so, then move the folio to the corect list?