On 1/31/22 17:29, David Hildenbrand wrote: > For example, if a page just got swapped in via a read fault, the LRU > pagevecs might still hold a reference to the page. If we trigger a > write fault on such a page, the additional reference from the LRU > pagevecs will prohibit reusing the page. > > Let's conditionally drain the local LRU pagevecs when we stumble over a > !PageLRU() page. We cannot easily drain remote LRU pagevecs and it might > not be desirable performance-wise. Consequently, this will only avoid > copying in some cases. > > Add a simple "page_count(page) > 3" check first but keep the > "page_count(page) > 1 + PageSwapCache(page)" check in place, as > we want to minimize cases where we remove a page from the swapcache but > won't be able to reuse it, for example, because another process has it > mapped R/O, to not affect reclaim. > > We cannot easily handle the following cases and we will always have to > copy: > > (1) The page is referenced in the LRU pagevecs of other CPUs. We really > would have to drain the LRU pagevecs of all CPUs -- most probably > copying is much cheaper. > > (2) The page is already PageLRU() but is getting moved between LRU > lists, for example, for activation (e.g., mark_page_accessed()), > deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to > drain mostly unconditionally, which might be bad performance-wise. > Most probably this won't happen too often in practice. > > Note that there are other reasons why an anon page might temporarily not > be PageLRU(): for example, compaction and migration have to isolate LRU > pages from the LRU lists first (isolate_lru_page()), moving them to > temporary local lists and clearing PageLRU() and holding an additional > reference on the page. In that case, we'll always copy. > > This change seems to be fairly effective with the reproducer [1] shared > by Nadav, as long as writeback is done synchronously, for example, using > zram. However, with asynchronous writeback, we'll usually fail to free the > swapcache because the page is still under writeback: something we cannot > easily optimize for, and maybe it's not really relevant in practice. > > [1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@xxxxxxxxx > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>