On Thu, 19 Dec 2024 15:32:53 -0500 Rik van Riel <riel@xxxxxxxxxxx> wrote: > There seem to be several categories of calls to lru_add_drain > and lru_add_drain_all. > > The first are code paths that recently allocated, swapped in, > or otherwise processed a batch of pages, and want them all on > the LRU. These drain pages that were recently allocated, > probably on the local CPU. > > A second category are code paths that are actively trying to > reclaim, migrate, or offline memory. These often use lru_add_drain_all, > to drain the caches on all CPUs. > > However, there also seem to be some other callers where we > aren't really doing either. They are calling lru_add_drain(), > despite operating on pages that may have been allocated > long ago, and quite possibly on different CPUs. > > Those calls are not likely to be effective at anything but > creating lock contention on the LRU locks. > > Remove the lru_add_drain calls in the latter category. These lru_add_drain() calls are the sorts of things we've added as bugfixes when things go weird in unexpected situations. So the need for them can be obscure. I'd be more comfortable if we'd gone through them all, hunted down the commits which added them, learned why these calls were added then explained why that reasoning is no longer valid. A lot of the ones you're removing precede a tlb_gather_mmu() operation. I wonder why we have (or had) that pattern?