Re: [PATCH] mm/vmscan: Fix hard LOCKUP in function isolate_lru_folios

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 14 Aug 2024 14:27:43 -0700

On Wed, 14 Aug 2024 17:18:25 +0800 liuye <liuye@xxxxxxxxxx> wrote:

> This fixes the following hard lockup in function isolate_lru_folios
> when memory reclaim.If the LRU mostly contains ineligible folios
> May trigger watchdog.
> 
> watchdog: Watchdog detected hard LOCKUP on cpu 173
> RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
> Call Trace:
> 	_raw_spin_lock_irqsave+0x31/0x40
> 	folio_lruvec_lock_irqsave+0x5f/0x90
> 	folio_batch_move_lru+0x91/0x150
> 	lru_add_drain_per_cpu+0x1c/0x40
> 	process_one_work+0x17d/0x350
> 	worker_thread+0x27b/0x3a0
> 	kthread+0xe8/0x120
> 	ret_from_fork+0x34/0x50
> 	ret_from_fork_asm+0x1b/0x30
> 
> lruvec->lru_lock owner：
> 
> PID: 2865     TASK: ffff888139214d40  CPU: 40   COMMAND: "kswapd0"
>  #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555
>  #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171
>  #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920
>  #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4
>  #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde
>     [exception RIP: isolate_lru_folios+403]
>     RIP: ffffffffa597df53  RSP: ffffc90006fb7c28  RFLAGS: 00000002
>     RAX: 0000000000000001  RBX: ffffc90006fb7c60  RCX: ffffea04a2196f88
>     RDX: ffffc90006fb7c60  RSI: ffffc90006fb7c60  RDI: ffffea04a2197048
>     RBP: ffff88812cbd3010   R8: ffffea04a2197008   R9: 0000000000000001
>     R10: 0000000000000000  R11: 0000000000000001  R12: ffffea04a2197008
>     R13: ffffea04a2197048  R14: ffffc90006fb7de8  R15: 0000000003e3e937
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>     <NMI exception stack>
>  #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
>  #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788
>  #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0
>  #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354
>  #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238
> crash>

Well that's bad.

> Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")

Merged in 2016.

Can you please describe how to reproduce this?  Under what circumstances
does it occur?  Why do you think it took eight years to be discovered?

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1655,6 +1655,7 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>  	unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
>  	unsigned long skipped = 0;
>  	unsigned long scan, total_scan, nr_pages;
> +	unsigned long max_nr_skipped = 0;
>  	LIST_HEAD(folios_skipped);
>  
>  	total_scan = 0;
> @@ -1669,10 +1670,12 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>  		nr_pages = folio_nr_pages(folio);
>  		total_scan += nr_pages;
>  
> -		if (folio_zonenum(folio) > sc->reclaim_idx ||
> -				skip_cma(folio, sc)) {
> +		/* Using max_nr_skipped to prevent hard LOCKUP*/
> +		if ((max_nr_skipped < SWAP_CLUSTER_MAX_SKIPPED) &&
> +			(folio_zonenum(folio) > sc->reclaim_idx || skip_cma(folio, sc))) {
>  			nr_skipped[folio_zonenum(folio)] += nr_pages;
>  			move_to = &folios_skipped;
> +			max_nr_skipped++;
>  			goto move;
>  		}

It looks like that will fix, but perhaps something more fundamental
needs to be done - we're doing a tremendous amount of pretty pointless
work here.  Answers to my above questions will help us resolve this.

Thanks.