Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Mon, 11 Mar 2024 12:26:25 +0000

On Mon, Mar 11, 2024 at 09:01:16AM +0000, Ryan Roberts wrote:
> [  153.499149] Call trace:
> [  153.499470]  uncharge_folio+0x1d0/0x2c8
> [  153.500045]  __mem_cgroup_uncharge_folios+0x5c/0xb0
> [  153.500795]  move_folios_to_lru+0x5bc/0x5e0
> [  153.501275]  shrink_lruvec+0x5f8/0xb30

> And that code is from your commit 29f3843026cf ("mm: free folios directly in move_folios_to_lru()") which is another patch in the same series. This suffers from the same problem; uncharge before removing folio from deferred list, so using wrong lock - there are 2 sites in this function that does this.

Two sites, but basically the same thing; one is for "the batch is full"
and the other is "we finished the list".

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0e53999a865..f60c5b3977dc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1842,6 +1842,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
 		if (unlikely(folio_put_testzero(folio))) {
 			__folio_clear_lru_flags(folio);
 
+			if (folio_test_large(folio) &&
+			    folio_test_large_rmappable(folio))
+				folio_undo_large_rmappable(folio);
 			if (folio_batch_add(&free_folios, folio) == 0) {
 				spin_unlock_irq(&lruvec->lru_lock);
 				mem_cgroup_uncharge_folios(&free_folios);

> A quick grep over the entire series has a lot of hits for "uncharge". I
> wonder if we need a full audit of that series for other places that
> could potentially be doing the same thing?

I think this assertion will catch all occurrences of the same thing,
as long as people who are testing are testing in a memcg.  My setup
doesn't use a memcg, so I never saw any of this ;-(

If you confirm this fixes it, I'll send two patches; a respin of the patch
I sent on Sunday that calls undo_large_rmappable in this one extra place,
and then a patch to add the assertions.