On Wed, Dec 11, 2019 at 12:18:38PM -0800, Linus Torvalds wrote: > On Wed, Dec 11, 2019 at 12:08 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > > > > $ cat /proc/meminfo | grep -i active > > Active: 134136 kB > > Inactive: 28683916 kB > > Active(anon): 97064 kB > > Inactive(anon): 4 kB > > Active(file): 37072 kB > > Inactive(file): 28683912 kB > > Yeah, that should not put pressure on some swap activity. We have 28 > GB of basically free inactive file data, and the VM is doing something > very very bad if it then doesn't just quickly free it with no real > drama. I was looking at this with Jens offline last week. One thing to note is the rate of IO that Jens is working with: combined with the low cache hit rate, it was pushing upwards of half a million pages through the page cache each second. There isn't anything obvious sticking out in the kswapd profile: it's dominated by cache tree deletions (or rather replacing pages with shadow entries, hence the misleading xas_store()), tree lock contention, etc. - all work that a direct reclaimer would have to do as well, with one exceptions: RWC_UNCACHED doesn't need to go through the LRU list, and 8-9% of kswapd cycles alone are going into physically getting pages off the list. (And I suspect part of that is also contention over the LRU lock as kswapd gets overwhelmed and direct reclaim kicks in). Jens, how much throughput difference does kswapd vs RWC_UNCACHED make?