Re: [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+ David Hildenbrand

On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@xxxxxxxxx> wrote:

> In offline_pages(), do_migrate_range() may potentially retry forever if
> the migration fails. Add a return value for do_migrate_range(), and
> allow offline_page() to try migrating pages 5 times before erroring
> out, similar to how migration failures in __alloc_contig_migrate_range()
> is handled.

I'm curious if this could cause unexpected behavioral differences to memory
hotplugging users, and how '5' is chosen.  Could you please enlighten me?

> 
> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> ---
>  mm/memory_hotplug.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 621ae1015106..49402442ea3b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1770,13 +1770,14 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  	return 0;
>  }
>  
> -static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> +static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)

Seems the return value is used for only knowing if it is failed or not.  If
there is no plan to use the error code in future, what about using bool return
type?

>  {
>  	struct folio *folio;
>  	unsigned long pfn;
>  	LIST_HEAD(source);
>  	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
>  				      DEFAULT_RATELIMIT_BURST);
> +	int ret = 0;
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>  		struct page *page;
> @@ -1833,7 +1834,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
>  			.reason = MR_MEMORY_HOTPLUG,
>  		};
> -		int ret;
>  
>  		/*
>  		 * We have checked that migration range is on a single zone so
> @@ -1863,6 +1863,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			putback_movable_pages(&source);
>  		}
>  	}
> +	return ret;
>  }
>  
>  static int __init cmdline_parse_movable_node(char *p)
> @@ -1940,6 +1941,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	const int node = zone_to_nid(zone);
>  	unsigned long flags;
>  	struct memory_notify arg;
> +	unsigned int tries = 0;
>  	char *reason;
>  	int ret;
>  
> @@ -2028,11 +2030,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  
>  			ret = scan_movable_pages(pfn, end_pfn, &pfn);
>  			if (!ret) {
> -				/*
> -				 * TODO: fatal migration failures should bail
> -				 * out
> -				 */
> -				do_migrate_range(pfn, end_pfn);
> +				if (do_migrate_range(pfn, end_pfn) && ++tries == 5)
> +					ret = -EBUSY;
>  			}

In the '++tries == 5' case, users will show the failure reason as "unmovable
page" from the debug log.  What about setting 'reason' here to be more
specific, e.g., "multiple migration failures"?

Also, my humble understanding of the intention of this change is as follow.  If
there are 'AS_WRITEBACK_MAY_BLOCK' pages in the migration target range,
do_migrate_range() will continuously fail.  And hence this could become
infinite loop.  Pleae let me know if I'm misunderstanding.

But if I'm not wrong...  There is a check for expected failures above
(scan_movable_pages()).  What about adding 'AS_WRITEBACK_MAY_BLOCK' pages
existence check there?

>  		} while (!ret);
>  
> -- 
> 2.43.5


Thanks,
SJ




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux