On Fri, Jan 17, 2020 at 03:30:18PM -0800, Yang Shi wrote: >On Fri, Jan 17, 2020 at 2:27 PM Wei Yang <richardw.yang@xxxxxxxxxxxxxxx> wrote: >> >> On Fri, Jan 17, 2020 at 03:45:34PM +0800, Wei Yang wrote: >> >If we get here after successfully adding page to list, err would be >> >the number of pages in the list. >> > >> >Current code has two problems: >> > >> > * on success, 0 is not returned >> > * on error, the real error code is not returned >> > >> >> Well, this breaks the user interface. User would receive 1 even the migration >> succeed. >> >> The change is introduced by e0153fc2c760 ("mm: move_pages: return valid node >> id in status if the page is already on the target node"). > >Yes, it may return a value which is > 0. But, it seems do_pages_move() >could return > 0 value even before this commit. > >For example, if I read the code correctly, it would do: > >If we already have some pages on the queue then >add_page_for_migration() return error, then do_move_pages_to_node() is >called, but it may return > 0 value (the number of pages that were >*not* migrated by migrate_pages()), then the code flow would just jump >to "out" and return the value. And, it may happen to be 1. > This is another point I think current code is not working well. And actually, the behavior is not well defined or our kernel is broken for a while. When you look at the man page, it says: RETURN VALUE On success move_pages() returns zero. On error, it returns -1, and sets errno to indicate the error So per my understanding, the design is to return -1 on error instead of the pages not managed to move. For the user interface, if original code check 0 for success, your change breaks it. Because your code would return 1 instead of 0. Suppose most user just read the man page for programming instead of reading the kernel source code. I believe we need to fix it. Not sure how to include some user interface related developer to look into this issue. Hope this thread may catch their eyes. >I'm not sure if it breaks the user interface since the behavior has >been existed for years, and it looks nobody complains about it. Maybe >glibc helps hide it or people just care if it is 0 and the status. > >> >> >Signed-off-by: Wei Yang <richardw.yang@xxxxxxxxxxxxxxx> >> >--- >> > mm/migrate.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> >diff --git a/mm/migrate.c b/mm/migrate.c >> >index 557da996b936..c3ef70de5876 100644 >> >--- a/mm/migrate.c >> >+++ b/mm/migrate.c >> >@@ -1677,7 +1677,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, >> > err1 = do_move_pages_to_node(mm, &pagelist, current_node); >> > if (!err1) >> > err1 = store_status(status, start, current_node, i - start); >> >- if (!err) >> >+ if (err >= 0) >> > err = err1; >> > out: >> > return err; >> >-- >> >2.17.1 >> >> -- >> Wei Yang >> Help you, Help me >> -- Wei Yang Help you, Help me