Michal Hocko <mhocko@xxxxxxxx> writes: > On Mon 07-08-23 22:31:52, Alistair Popple wrote: >> >> Michal Hocko <mhocko@xxxxxxxx> writes: >> >> > On Mon 07-08-23 16:39:44, Alistair Popple wrote: >> >> When a page fails to migrate move_pages() returns the error code in a >> >> per-page array of status values. The function call itself is also >> >> supposed to return a summary error code indicating that a failure >> >> occurred. >> >> >> >> This doesn't always happen. Instead success can be returned even >> >> though some pages failed to migrate. This is due to incorrectly >> >> returning the error code from store_status() rather than the code from >> >> add_page_for_migration. Fix this by only returning an error from >> >> store_status() if the store actually failed. >> > >> > Error reporting by this syscall has been really far from >> > straightforward. Please read through a49bd4d71637 and the section "On a >> > side note". >> > Is there any specific reason you are trying to address this now or is >> > this motivated by the code inspection? >> >> Thanks Michal. There was no specific reason to address this now other >> than I came across this behaviour when updating the migration selftest >> to inspect the status array and thought it was odd. I was seeing pages >> had failed to migrate according to the status argument even though >> move_pages() had returned 0 (ie. success) rather than a number of >> non-migrated pages. > > It is good to mention such a motivation in the changelog to make it > clear. Also do we have a specific test case which trigger this case? Not explicitly/reliably although I could write one. >> If I'm interpreting the side note correctly the behaviour you were >> concerned about was the opposite - returning a fail return code from >> move_pages() but not indicating failure in the status array. >> >> That said I'm happy to leave the behaviour as is, although in that case >> an update to the man page is in order to clarify a return value of 0 >> from move_pages() doesn't actually mean all pages were successfully >> migrated. > > While I would say that it is better to let old dogs sleep I do not mind > changing the behavior and see whether anything breaks. I suspect nobody > except for couple of test cases hardcoded to the original behavior will > notice. > >> >> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> >> >> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move") > > The patch itself looks good. I am not sure the fixes tag is accurate. > Has the reporting been correct before this change? I didn't have time to > re-read the original code which was quite different. I dug deeper into the history and the fixes tag is wrong. The behaviour was actually introduced way back in commit e78bbfa82624 ("mm: stop returning -ENOENT from sys_move_pages() if nothing got migrated"). As you may guess from the title it was intentional, so suspect it is better to update documentation. > Anyway > Acked-by: Michal Hocko <mhocko@xxxxxxxx> Thanks for looking, but I will drop this and see if I can get the man page updated. > Anyway rewriting this function to clarify the error handling would be a > nice exercise if somebody is interested. Yeah, everytime I look at this function I want to do that but haven't yet found the time. >> >> --- >> >> mm/migrate.c | 4 +++- >> >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> >> index 24baad2571e3..bb3a37245e13 100644 >> >> --- a/mm/migrate.c >> >> +++ b/mm/migrate.c >> >> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, >> >> * If the page is already on the target node (!err), store the >> >> * node, otherwise, store the err. >> >> */ >> >> - err = store_status(status, i, err ? : current_node, 1); >> >> + err1 = store_status(status, i, err ? : current_node, 1); >> >> + if (err1) >> >> + err = err1; >> >> if (err) >> >> goto out_flush; >> >> >> >> -- >> >> 2.39.2