Alistair Popple <apopple@xxxxxxxxxx> writes: > Michal Hocko <mhocko@xxxxxxxx> writes: > >> On Mon 07-08-23 22:31:52, Alistair Popple wrote: >>> >>> Michal Hocko <mhocko@xxxxxxxx> writes: >>> >>> > On Mon 07-08-23 16:39:44, Alistair Popple wrote: >>> >> When a page fails to migrate move_pages() returns the error code in a >>> >> per-page array of status values. The function call itself is also >>> >> supposed to return a summary error code indicating that a failure >>> >> occurred. >>> >> >>> >> This doesn't always happen. Instead success can be returned even >>> >> though some pages failed to migrate. This is due to incorrectly >>> >> returning the error code from store_status() rather than the code from >>> >> add_page_for_migration. Fix this by only returning an error from >>> >> store_status() if the store actually failed. >>> > >>> > Error reporting by this syscall has been really far from >>> > straightforward. Please read through a49bd4d71637 and the section "On a >>> > side note". >>> > Is there any specific reason you are trying to address this now or is >>> > this motivated by the code inspection? >>> >>> Thanks Michal. There was no specific reason to address this now other >>> than I came across this behaviour when updating the migration selftest >>> to inspect the status array and thought it was odd. I was seeing pages >>> had failed to migrate according to the status argument even though >>> move_pages() had returned 0 (ie. success) rather than a number of >>> non-migrated pages. >> >> It is good to mention such a motivation in the changelog to make it >> clear. Also do we have a specific test case which trigger this case? > > Not explicitly/reliably although I could write one. > >>> If I'm interpreting the side note correctly the behaviour you were >>> concerned about was the opposite - returning a fail return code from >>> move_pages() but not indicating failure in the status array. IIUC, we cannot avoid this even if leaving code untouched. In move_pages_and_store_status(), if some pages fails to be migrated, we will not store status. In fact, we don't know which pages failed to be migrated in kernel too. >>> That said I'm happy to leave the behaviour as is, although in that case >>> an update to the man page is in order to clarify a return value of 0 >>> from move_pages() doesn't actually mean all pages were successfully >>> migrated. IMHO, return value is more important than "status" as the reason I mentioned above. At least, we can make one thing correct. >> While I would say that it is better to let old dogs sleep I do not mind >> changing the behavior and see whether anything breaks. I suspect nobody >> except for couple of test cases hardcoded to the original behavior will >> notice. >> >>> >> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> >>> >> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move") >> >> The patch itself looks good. I am not sure the fixes tag is accurate. >> Has the reporting been correct before this change? I didn't have time to >> re-read the original code which was quite different. > > I dug deeper into the history and the fixes tag is wrong. The behaviour > was actually introduced way back in commit e78bbfa82624 ("mm: stop > returning -ENOENT from sys_move_pages() if nothing got migrated"). As > you may guess from the title it was intentional, so suspect it is better > to update documentation. Can we change the code but keep the -ENOENT behavior with some special case? -- Best Regards, Huang, Ying >> Anyway >> Acked-by: Michal Hocko <mhocko@xxxxxxxx> > > Thanks for looking, but I will drop this and see if I can get the man > page updated. > >> Anyway rewriting this function to clarify the error handling would be a >> nice exercise if somebody is interested. > > Yeah, everytime I look at this function I want to do that but haven't > yet found the time. > >>> >> --- >>> >> mm/migrate.c | 4 +++- >>> >> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >> >>> >> diff --git a/mm/migrate.c b/mm/migrate.c >>> >> index 24baad2571e3..bb3a37245e13 100644 >>> >> --- a/mm/migrate.c >>> >> +++ b/mm/migrate.c >>> >> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, >>> >> * If the page is already on the target node (!err), store the >>> >> * node, otherwise, store the err. >>> >> */ >>> >> - err = store_status(status, i, err ? : current_node, 1); >>> >> + err1 = store_status(status, i, err ? : current_node, 1); >>> >> + if (err1) >>> >> + err = err1; >>> >> if (err) >>> >> goto out_flush; >>> >> >>> >> -- >>> >> 2.39.2