On 24.05.2018 16:39, Michal Hocko wrote: > [I didn't really go through other patch but this one caught my eyes just > because of the similar request proposed yesterday] > > On Wed 23-05-18 17:11:50, David Hildenbrand wrote: > [...] >> @@ -1686,6 +1686,10 @@ static int __ref __offline_pages(unsigned long start_pfn, >> pfn = scan_movable_pages(start_pfn, end_pfn); >> if (pfn) { /* We have movable pages */ >> ret = do_migrate_range(pfn, end_pfn); >> + if (ret && !retry_forever) { >> + ret = -EBUSY; >> + goto failed_removal; >> + } >> goto repeat; >> } >> > > Btw. this will not work in practice. Even a single temporary pin on a page > will fail way too easily. If you really need to control this then make > it a retry counter with default -1UL. Interestingly, this will work for the one specific use case that I am using this interface for right now. The reason is that I don't want to offline a specific chunk, I want to find some chunks to offline (in contrast to e.g. DIMMs where you rely try to offline a very specific one). If I get a failure on that chunk (e.g. temporary pin) I will retry the next chunk. At one point, I will eventually retry this chunk and then it succeeds. > > We really do need a better error reporting from do_migrate_range and > distinguish transient from permanent failures. In general we shouldn't > even get here for pages which are not migrateable... I totally agree, I also want to know if an error is permanent or transient - and I want the posibility to "fail fast" (e.g. -EAGAIN) instead of looping forever. -- Thanks, David / dhildenb