On 9/2/20 4:31 PM, Pavel Tatashin wrote: >> > > The fix is to try to drain per-cpu lists again after >> > > check_pages_isolated_cb() fails. >> >> Still trying to wrap my head around this but I think this is not a >> proper fix. It should be the page isolation to make sure no races are >> possible with the page freeing path. >> > > As Bharata B Rao found in another thread, the problem was introduced > by this change: > c52e75935f8d: mm: remove extra drain pages on pcp list > > So, the drain used to be tried every time with lru_add_drain_all(); > Which, I think is excessive, as we start a thread per cpu to try to > drain and catch a rare race condition. With the proposed change we > drain again only when we find such a condition. Fixing it in > start_isolate_page_range means that we must somehow synchronize it > with the release_pages() which adds costs to runtime code, instead of > to hot-remove code. Agreed. Isolation was always racy wrt freeing to pcplists, and it was simply acceptable to do some extra drains if needed. Removing that race would be indeed acceptable only if it didn't affect alloc/free fastpaths. > Pasha >