> > > The fix is to try to drain per-cpu lists again after > > > check_pages_isolated_cb() fails. > > Still trying to wrap my head around this but I think this is not a > proper fix. It should be the page isolation to make sure no races are > possible with the page freeing path. > As Bharata B Rao found in another thread, the problem was introduced by this change: c52e75935f8d: mm: remove extra drain pages on pcp list So, the drain used to be tried every time with lru_add_drain_all(); Which, I think is excessive, as we start a thread per cpu to try to drain and catch a rare race condition. With the proposed change we drain again only when we find such a condition. Fixing it in start_isolate_page_range means that we must somehow synchronize it with the release_pages() which adds costs to runtime code, instead of to hot-remove code. Pasha