On Mon 07-08-23 22:31:52, Alistair Popple wrote: > > Michal Hocko <mhocko@xxxxxxxx> writes: > > > On Mon 07-08-23 16:39:44, Alistair Popple wrote: > >> When a page fails to migrate move_pages() returns the error code in a > >> per-page array of status values. The function call itself is also > >> supposed to return a summary error code indicating that a failure > >> occurred. > >> > >> This doesn't always happen. Instead success can be returned even > >> though some pages failed to migrate. This is due to incorrectly > >> returning the error code from store_status() rather than the code from > >> add_page_for_migration. Fix this by only returning an error from > >> store_status() if the store actually failed. > > > > Error reporting by this syscall has been really far from > > straightforward. Please read through a49bd4d71637 and the section "On a > > side note". > > Is there any specific reason you are trying to address this now or is > > this motivated by the code inspection? > > Thanks Michal. There was no specific reason to address this now other > than I came across this behaviour when updating the migration selftest > to inspect the status array and thought it was odd. I was seeing pages > had failed to migrate according to the status argument even though > move_pages() had returned 0 (ie. success) rather than a number of > non-migrated pages. It is good to mention such a motivation in the changelog to make it clear. Also do we have a specific test case which trigger this case? > If I'm interpreting the side note correctly the behaviour you were > concerned about was the opposite - returning a fail return code from > move_pages() but not indicating failure in the status array. > > That said I'm happy to leave the behaviour as is, although in that case > an update to the man page is in order to clarify a return value of 0 > from move_pages() doesn't actually mean all pages were successfully > migrated. While I would say that it is better to let old dogs sleep I do not mind changing the behavior and see whether anything breaks. I suspect nobody except for couple of test cases hardcoded to the original behavior will notice. > >> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> > >> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move") The patch itself looks good. I am not sure the fixes tag is accurate. Has the reporting been correct before this change? I didn't have time to re-read the original code which was quite different. Anyway Acked-by: Michal Hocko <mhocko@xxxxxxxx> Anyway rewriting this function to clarify the error handling would be a nice exercise if somebody is interested. > >> --- > >> mm/migrate.c | 4 +++- > >> 1 file changed, 3 insertions(+), 1 deletion(-) > >> > >> diff --git a/mm/migrate.c b/mm/migrate.c > >> index 24baad2571e3..bb3a37245e13 100644 > >> --- a/mm/migrate.c > >> +++ b/mm/migrate.c > >> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, > >> * If the page is already on the target node (!err), store the > >> * node, otherwise, store the err. > >> */ > >> - err = store_status(status, i, err ? : current_node, 1); > >> + err1 = store_status(status, i, err ? : current_node, 1); > >> + if (err1) > >> + err = err1; > >> if (err) > >> goto out_flush; > >> > >> -- > >> 2.39.2 -- Michal Hocko SUSE Labs