On Thu, 21 Dec 2023 22:25:39 -0800 Chris Li <chrisl@xxxxxxxxxx> wrote: > We discovered that 1% swap page fault is 100us+ while 50% of > the swap fault is under 20us. > > Further investigation show that a large portion of the time > spent in the free_swap_slots() function for the long tail case. > > The percpu cache of swap slots is freed in a batch of 64 entries > inside free_swap_slots(). These cache entries are accumulated > from previous page faults, which may not be related to the current > process. > > Doing the batch free in the page fault handler causes longer > tail latencies and penalizes the current process. > > Move free_swap_slots() outside of the swapin page fault handler into an > async work queue to avoid such long tail latencies. This will require a larger amount of total work than the current scheme. So we're trading that off against better latency. Why is this a good tradeoff?