On 27.02.19 23:00, Mike Kravetz wrote: > On 2/27/19 1:51 PM, Oscar Salvador wrote: >> On Thu, Feb 21, 2019 at 10:42:12AM +0100, Oscar Salvador wrote: >>> [1] https://lore.kernel.org/patchwork/patch/998796/ >>> >>> Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> >> >> Any further comments on this? >> I do have a "concern" I would like to sort out before dropping the RFC: >> >> It is the fact that unless we have spare gigantic pages in other notes, the >> offlining operation will loop forever (until the customer cancels the operation). >> While I do not really like that, I do think that memory offlining should be done >> with some sanity, and the administrator should know in advance if the system is going >> to be able to keep up with the memory pressure, aka: make sure we got what we need in >> order to make the offlining operation to succeed. >> That translates to be sure that we have spare gigantic pages and other nodes >> can take them. >> >> Given said that, another thing I thought about is that we could check if we have >> spare gigantic pages at has_unmovable_pages() time. >> Something like checking "h->free_huge_pages - h->resv_huge_pages > 0", and if it >> turns out that we do not have gigantic pages anywhere, just return as we have >> non-movable pages. > > Of course, that check would be racy. Even if there is an available gigantic > page at has_unmovable_pages() time there is no guarantee it will be there when > we want to allocate/use it. But, you would at least catch 'most' cases of > looping forever. > >> But I would rather not convulate has_unmovable_pages() with such checks and "trust" >> the administrator. I think we have the exact same issue already with huge/ordinary pages if we are low on memory. We could loop forever. In the long run, we should properly detect such issues and abort instead of looping forever I guess. But as we all know, error handling in the whole offlining part is still far away from being perfect ... -- Thanks, David / dhildenb