Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed

Ryan Roberts <ryan.roberts@xxxxxxx> · Mon, 11 Mar 2024 12:36:00 +0000

On 11/03/2024 12:26, Matthew Wilcox wrote:
> On Mon, Mar 11, 2024 at 09:01:16AM +0000, Ryan Roberts wrote:
>> [  153.499149] Call trace:
>> [  153.499470]  uncharge_folio+0x1d0/0x2c8
>> [  153.500045]  __mem_cgroup_uncharge_folios+0x5c/0xb0
>> [  153.500795]  move_folios_to_lru+0x5bc/0x5e0
>> [  153.501275]  shrink_lruvec+0x5f8/0xb30
> 
>> And that code is from your commit 29f3843026cf ("mm: free folios directly in move_folios_to_lru()") which is another patch in the same series. This suffers from the same problem; uncharge before removing folio from deferred list, so using wrong lock - there are 2 sites in this function that does this.
> 
> Two sites, but basically the same thing; one is for "the batch is full"
> and the other is "we finished the list".
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a0e53999a865..f60c5b3977dc 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1842,6 +1842,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
>  		if (unlikely(folio_put_testzero(folio))) {
>  			__folio_clear_lru_flags(folio);
>  
> +			if (folio_test_large(folio) &&
> +			    folio_test_large_rmappable(folio))
> +				folio_undo_large_rmappable(folio);
>  			if (folio_batch_add(&free_folios, folio) == 0) {
>  				spin_unlock_irq(&lruvec->lru_lock);
>  				mem_cgroup_uncharge_folios(&free_folios);
> 
>> A quick grep over the entire series has a lot of hits for "uncharge". I
>> wonder if we need a full audit of that series for other places that
>> could potentially be doing the same thing?
> 
> I think this assertion will catch all occurrences of the same thing,
> as long as people who are testing are testing in a memcg.  My setup
> doesn't use a memcg, so I never saw any of this ;-(
> 
> If you confirm this fixes it, I'll send two patches; a respin of the patch
> I sent on Sunday that calls undo_large_rmappable in this one extra place,
> and then a patch to add the assertions.

Good timing on your response - I've just finished testing! Although my patch
included both the site you have above and another that I fixed up speculatively
in shrink_folio_list() based on reviewing all mem_cgroup_uncharge_folios() call
sites.

I haven't been able to reproduce any issue with this patch (and the extra
asserts) in place. I've run ~50 iterations over ~2 hours. Previous record was
about 30 iterations before catching an oops. Given how difficult it now is to
repro, I can't be sure this has definitely fixed all possible places, but its
looking positive.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0e53999a865..cf7d4cf47f1a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1436,6 +1436,9 @@ static unsigned int shrink_folio_list(struct list_head
*folio_list,
                 */
                nr_reclaimed += nr_pages;

+               if (folio_test_large(folio) && folio_test_large_rmappable(folio))
+                       folio_undo_large_rmappable(folio);
+
                if (folio_batch_add(&free_folios, folio) == 0) {
                        mem_cgroup_uncharge_folios(&free_folios);
                        try_to_unmap_flush();
@@ -1842,6 +1845,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
                if (unlikely(folio_put_testzero(folio))) {
                        __folio_clear_lru_flags(folio);

+                       if (folio_test_large(folio) &&
folio_test_large_rmappable(folio))
+                               folio_undo_large_rmappable(folio);
+
                        if (folio_batch_add(&free_folios, folio) == 0) {
                                spin_unlock_irq(&lruvec->lru_lock);
                                mem_cgroup_uncharge_folios(&free_folios);



There is also a call to mem_cgroup_uncharge() in delete_from_lru_cache(), which
I couldn't convince myself was safe. Perhaps you could do a quick audit of the
call sites?

But taking a step back, I wonder whether the charge() and uncharge() functions
should really be checking to see if the folio is on a deferred split list and if
so, then move the folio to the corect list?