On Thu, Dec 12, 2019 at 9:52 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > 1. We could semi-sort the pages on the LRU list. If we know we're going > to remove a bunch of pages, we could take a batch of them off the list, > sort them and remove them in-order. This probably wouldn't be terribly > effective. I don't think the sorting is relevant. Once you batch things, you already would get most of the locality advantage in the cache if it exists (and the batch isn't insanely large so that one batch already causes cache overflows). The problem - I suspect - is that we don't batch at all. Or rather, the "batching" does exist at a high level, but it's so high that there's just tons of stuff going on between single pages. It is at the shrink_page_list() level, which is pretty high up and basically does one page at a time with locking and a lot of tests for each page, and then we do "__remove_mapping()" (which does some more work) one at a time before we actually get to __delete_from_page_cache(). So it's "batched", but it's in a huge loop, and even at that huge loop level the batch size is fairly small. We limit it to SWAP_CLUSTER_MAX, which is just 32. Thinking about it, that SWAP_CLUSTER_MAX may make sense in some other circumstances, but not necessarily in the "shrink clean inactive pages" thing. I wonder if we could just batch clean pages a _lot_ more aggressively. Yes, our batching loop is still very big and it might not help at an L1 level, but it might help in the L2, at least. In kswapd, when we have 28 GB of pages on the inactive list, a batch of 32 pages at a time is pretty small ;) > 2. We could change struct page to point to the xa_node that holds them. > Looking up the page mapping would be page->xa_node->array and then > offsetof(i_pages) to get the mapping. I don't think we have space in 'struct page', and I'm pretty sure we don't want to grow it. That's one of the more common data structures in the kernel. Linus