Re: [PATCH] Fix do_move_pages_to_node() error handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 14-11-18 08:34:15, Michal Hocko wrote:
> On Tue 13-11-18 16:40:59, p.jaroszynski@xxxxxxxxx wrote:
> > From: Piotr Jaroszynski <pjaroszynski@xxxxxxxxxx>
> > 
> > migrate_pages() can return the number of pages that failed to migrate
> > instead of 0 or an error code. If that happens, the positive return is
> > treated as an error all the way up through the stack leading to the
> > move_pages() syscall returning a positive number. I believe this
> > regressed with commit a49bd4d71637 ("mm, numa: rework do_pages_move")
> > that refactored a lot of this code.
> 
> Yes this is correct.
> 
> > Fix this by treating positive returns as success in
> > do_move_pages_to_node() as that seems to most closely follow the
> > previous code. This still leaves the question whether silently
> > considering this case a success is the right thing to do as even the
> > status of the pages will be set as if they were successfully migrated,
> > but that seems to have been the case before as well.
> 
> Yes, I believe the previous semantic was just wrong and we want to fix
> it. Jan has already brought this up [1]. I believe we want to update the
> documentation rather than restore the previous hazy semantic.
> 
> Just wondering, how have you found out? Is there any real application
> failing because of the change or this is a result of some test?
> 
> [1] http://lkml.kernel.org/r/0329efa0984b9b0252ef166abb4498c0795fab36.1535113317.git.jstancek@xxxxxxxxxx

Btw. this is what I was suggesting back then (along with the man page
update suggested by Jan)

>From cfb88c266b645197135cde2905c2bfc82f6d82a9 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@xxxxxxxx>
Date: Wed, 14 Nov 2018 12:19:09 +0100
Subject: [PATCH] mm: fix do_pages_move error reporting

a49bd4d71637 ("mm, numa: rework do_pages_move") has changed the way how
we report error to layers above. As the changelog mentioned the semantic
was quite unclear previously because the return 0 could mean both
success and failure.

The above mentioned commit didn't get all the way down to fix this
completely because it doesn't report pages that we even haven't
attempted to migrate and therefore we cannot simply say that the
semantic is:
- err < 0 - errno
- err >= 0 number of non-migrated pages.

Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
---
 mm/migrate.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index f7e4bfdc13b7..aa53ebc523eb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1615,8 +1615,16 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 			goto out_flush;
 
 		err = do_move_pages_to_node(mm, &pagelist, current_node);
-		if (err)
+		if (err) {
+			/*
+			 * Possitive err means the number of failed pages to
+			 * migrate. Make sure to report the rest of the
+			 * nr_pages is not migrated as well.
+			 */
+			if (err > 0)
+				err += nr_pages - i - 1;
 			goto out;
+		}
 		if (i > start) {
 			err = store_status(status, start, current_node, i - start);
 			if (err)
-- 
2.19.1

-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux