Re: [PATCH 1/2] mm/migrate.c: Fix return code when migration fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michal Hocko <mhocko@xxxxxxxx> writes:

> On Mon 07-08-23 22:31:52, Alistair Popple wrote:
>> 
>> Michal Hocko <mhocko@xxxxxxxx> writes:
>> 
>> > On Mon 07-08-23 16:39:44, Alistair Popple wrote:
>> >> When a page fails to migrate move_pages() returns the error code in a
>> >> per-page array of status values. The function call itself is also
>> >> supposed to return a summary error code indicating that a failure
>> >> occurred.
>> >> 
>> >> This doesn't always happen. Instead success can be returned even
>> >> though some pages failed to migrate. This is due to incorrectly
>> >> returning the error code from store_status() rather than the code from
>> >> add_page_for_migration. Fix this by only returning an error from
>> >> store_status() if the store actually failed.
>> >
>> > Error reporting by this syscall has been really far from
>> > straightforward. Please read through a49bd4d71637 and the section "On a
>> > side note". 
>> > Is there any specific reason you are trying to address this now or is
>> > this motivated by the code inspection?
>> 
>> Thanks Michal. There was no specific reason to address this now other
>> than I came across this behaviour when updating the migration selftest
>> to inspect the status array and thought it was odd. I was seeing pages
>> had failed to migrate according to the status argument even though
>> move_pages() had returned 0 (ie. success) rather than a number of
>> non-migrated pages.
>
> It is good to mention such a motivation in the changelog to make it
> clear. Also do we have a specific test case which trigger this case?

Not explicitly/reliably although I could write one.

>> If I'm interpreting the side note correctly the behaviour you were
>> concerned about was the opposite - returning a fail return code from
>> move_pages() but not indicating failure in the status array.
>> 
>> That said I'm happy to leave the behaviour as is, although in that case
>> an update to the man page is in order to clarify a return value of 0
>> from move_pages() doesn't actually mean all pages were successfully
>> migrated.
>
> While I would say that it is better to let old dogs sleep I do not mind
> changing the behavior and see whether anything breaks. I suspect nobody
> except for couple of test cases hardcoded to the original behavior will
> notice.
>
>> >> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx>
>> >> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>
> The patch itself looks good. I am not sure the fixes tag is accurate.
> Has the reporting been correct before this change? I didn't have time to
> re-read the original code which was quite different.

I dug deeper into the history and the fixes tag is wrong. The behaviour
was actually introduced way back in commit e78bbfa82624 ("mm: stop
returning -ENOENT from sys_move_pages() if nothing got migrated"). As
you may guess from the title it was intentional, so suspect it is better
to update documentation.

> Anyway
> Acked-by: Michal Hocko <mhocko@xxxxxxxx>

Thanks for looking, but I will drop this and see if I can get the man
page updated.

> Anyway rewriting this function to clarify the error handling would be a
> nice exercise if somebody is interested.

Yeah, everytime I look at this function I want to do that but haven't
yet found the time.

>> >> ---
>> >>  mm/migrate.c | 4 +++-
>> >>  1 file changed, 3 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/mm/migrate.c b/mm/migrate.c
>> >> index 24baad2571e3..bb3a37245e13 100644
>> >> --- a/mm/migrate.c
>> >> +++ b/mm/migrate.c
>> >> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> >>  		 * If the page is already on the target node (!err), store the
>> >>  		 * node, otherwise, store the err.
>> >>  		 */
>> >> -		err = store_status(status, i, err ? : current_node, 1);
>> >> +		err1 = store_status(status, i, err ? : current_node, 1);
>> >> +		if (err1)
>> >> +			err = err1;
>> >>  		if (err)
>> >>  			goto out_flush;
>> >>  
>> >> -- 
>> >> 2.39.2





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux