On Mon, Jan 20, 2025 at 7:44 PM Byungchul Park <byungchul@xxxxxx> wrote: > The *interesting* IPIs will be reduced by 1/512 at most. Can we see the improvement number? Yes, we reduce IPIs by a factor of 512 by sending one IPI (for TLB flush) per PMD rather than per page. Since shrink_folio_list() operates on one PMD at a time, I believe we can safely batch these operations here. Here's a concrete example: When swapping out 20 GiB (5.2M pages): - Current: Each page triggers an IPI to all cores - With 6 cores: 31.4M total interrupts (6 cores × 5.2M pages) - With patch: One IPI per PMD (512 pages) - Only 10.2K IPIs required (5.2M/512) - With 6 cores: 61.4K total interrupts - Results in ~99% reduction in total interrupts Application performance impact varies by workload, but here's a representative test case: - Thread 1: Continuously accesses a 2 GiB private anonymous map (64B chunks at random offsets) - Thread 2: Pinned to different core, uses MADV_PAGEOUT on 20 GiB private anonymous map to swap it out to SSD - The threads only access their respective maps. Results: - Without patch: Thread 1 sees ~53% throughput reduction during swap. If there are multiple worker threads (like thread 1), the cumulative throughput degradation will be much higher - With patch: Thread 1 maintains normal throughput I expect a similar application performance impact when memory reclaim is triggered by kswapd.