Re: [PATCH] Fix do_move_pages_to_node() error handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 15-11-18 10:58:33, Piotr Jaroszynski wrote:
> On 11/15/18 12:47 AM, Michal Hocko wrote:
> > On Wed 14-11-18 17:04:37, Piotr Jaroszynski wrote:
[...]
> >> The proposed solution adds a new case to handle, but it will just tell
> >> us the page status is unusable and all we can do is just retry blindly.
> >> If it was possible to plumb through the migration status for each page
> >> accurately that would allow us to save redoing the call for pages that
> >> actually worked. Perhaps we would need a special status for pages
> >> skipped due to errors.
> > 
> > This would be possible but with this patch applied you should know how
> > many pages to skip from the tail of the array.
> 
> At least in our case the node target is the same for all the pages so we
> would just learn that all the pages failed to migrate as they would be
> all batched together to the do_move_pages_to_node() call.

Anyway, could you give this patch a try please? I would appreciate some
Tested-bys to push this forward ;)

> >> But maybe this is all a tiny corner case short of the bug I hit (see
> >> more below) and it's not worth thinking too much about.
> >>
> >>>>>> Just wondering, how have you found out? Is there any real application
> >>>>>> failing because of the change or this is a result of some test?
> >>>>
> >>>> I have a test that creates a tmp file, mmaps it as shared, memsets the
> >>>> memory and then attempts to move it to a different node. It used to
> >>>> work, but now fails. I suspect the filesystem's migratepage() callback
> >>>> regressed and will look into it next. So far I have only tested this on
> >>>> powerpc with the xfs filesystem.
> >>>
> >>> I would be surprise if the rewor changed the migration behavior.
> >>
> >> It didn't, I tracked it down to the new fs/iomap.c code used by xfs not
> >> being compatible with migrate_page_move_mapping() and prepared a perhaps
> >> naive fix in [1].
> > 
> > I am not familiar with iomap code much TBH so I cannot really judge your
> > fix.
> > 
> 
> Christoph reviewed it already (thanks!) so it should be good after all.
> But in its context, I wanted to ask about migrate_page_move_mapping()
> page count checks that it was hitting. Is it true that the count checks
> are to handle the case when a page might be temporarily pinned and hence
> have the count too high temporarily?

Yes. We cannot really migrate pinned pages.

> That would explain why it returns
> EAGAIN in this case. But should having the count too low (what the bug
> was hitting) be a fatal error with a WARN maybe? Or are there expected
> cases where the count is too low temporarily too?

Nope, page reference count too low is a bug.

> I could send a patch
> for that, but also just wanted to understand the expectations.

-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux