On Thu 15-11-18 10:58:33, Piotr Jaroszynski wrote: > On 11/15/18 12:47 AM, Michal Hocko wrote: > > On Wed 14-11-18 17:04:37, Piotr Jaroszynski wrote: [...] > >> The proposed solution adds a new case to handle, but it will just tell > >> us the page status is unusable and all we can do is just retry blindly. > >> If it was possible to plumb through the migration status for each page > >> accurately that would allow us to save redoing the call for pages that > >> actually worked. Perhaps we would need a special status for pages > >> skipped due to errors. > > > > This would be possible but with this patch applied you should know how > > many pages to skip from the tail of the array. > > At least in our case the node target is the same for all the pages so we > would just learn that all the pages failed to migrate as they would be > all batched together to the do_move_pages_to_node() call. Anyway, could you give this patch a try please? I would appreciate some Tested-bys to push this forward ;) > >> But maybe this is all a tiny corner case short of the bug I hit (see > >> more below) and it's not worth thinking too much about. > >> > >>>>>> Just wondering, how have you found out? Is there any real application > >>>>>> failing because of the change or this is a result of some test? > >>>> > >>>> I have a test that creates a tmp file, mmaps it as shared, memsets the > >>>> memory and then attempts to move it to a different node. It used to > >>>> work, but now fails. I suspect the filesystem's migratepage() callback > >>>> regressed and will look into it next. So far I have only tested this on > >>>> powerpc with the xfs filesystem. > >>> > >>> I would be surprise if the rewor changed the migration behavior. > >> > >> It didn't, I tracked it down to the new fs/iomap.c code used by xfs not > >> being compatible with migrate_page_move_mapping() and prepared a perhaps > >> naive fix in [1]. > > > > I am not familiar with iomap code much TBH so I cannot really judge your > > fix. > > > > Christoph reviewed it already (thanks!) so it should be good after all. > But in its context, I wanted to ask about migrate_page_move_mapping() > page count checks that it was hitting. Is it true that the count checks > are to handle the case when a page might be temporarily pinned and hence > have the count too high temporarily? Yes. We cannot really migrate pinned pages. > That would explain why it returns > EAGAIN in this case. But should having the count too low (what the bug > was hitting) be a fatal error with a WARN maybe? Or are there expected > cases where the count is too low temporarily too? Nope, page reference count too low is a bug. > I could send a patch > for that, but also just wanted to understand the expectations. -- Michal Hocko SUSE Labs