Re: [PATCH] mm: Optimize TLB flushes during page reclaim

Vinay Banakar <vny@xxxxxxxxxx> · Tue, 21 Jan 2025 12:03:20 -0600

On Mon, Jan 20, 2025 at 7:44 PM Byungchul Park <byungchul@xxxxxx> wrote:
> The *interesting* IPIs will be reduced by 1/512 at most.  Can we see the
improvement number?

Yes, we reduce IPIs by a factor of 512 by sending one IPI (for TLB
flush) per PMD rather than per page. Since shrink_folio_list()
operates on one PMD at a time, I believe we can safely batch these
operations here.

Here's a concrete example:
When swapping out 20 GiB (5.2M pages):
- Current: Each page triggers an IPI to all cores
  - With 6 cores: 31.4M total interrupts (6 cores × 5.2M pages)
- With patch: One IPI per PMD (512 pages)
  - Only 10.2K IPIs required (5.2M/512)
  - With 6 cores: 61.4K total interrupts
  - Results in ~99% reduction in total interrupts

Application performance impact varies by workload, but here's a
representative test case:
- Thread 1: Continuously accesses a 2 GiB private anonymous map (64B
chunks at random offsets)
- Thread 2: Pinned to different core, uses MADV_PAGEOUT on 20 GiB
private anonymous map to swap it out to SSD
- The threads only access their respective maps.
Results:
  - Without patch: Thread 1 sees ~53% throughput reduction during
swap. If there are multiple worker threads (like thread 1), the
cumulative throughput degradation will be much higher
  - With patch: Thread 1 maintains normal throughput

I expect a similar application performance impact when memory reclaim
is triggered by kswapd.