On 2024/1/19 03:24, Nhat Pham wrote: > On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou > <zhouchengming@xxxxxxxxxxxxx> wrote: >> >> Each swapfile has one rb-tree to search the mapping of swp_entry_t to >> zswap_entry, that use a spinlock to protect, which can cause heavy lock >> contention if multiple tasks zswap_store/load concurrently. >> >> Optimize the scalability problem by splitting the zswap rb-tree into >> multiple rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M), >> just like we did in the swap cache address_space splitting. >> >> Although this method can't solve the spinlock contention completely, it >> can mitigate much of that contention. Below is the results of kernel build >> in tmpfs with zswap shrinker enabled: >> >> linux-next zswap-lock-optimize >> real 1m9.181s 1m3.820s >> user 17m44.036s 17m40.100s >> sys 7m37.297s 4m54.622s > > That's really impressive, especially the sys time reduction :) Well done. > Thanks! >> >> So there are clearly improvements. >> >> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> > > Code looks solid too. I haven't read the xarray patch series too > closely yet, but this patch series is clearly already an improvement. > It is simple, with existing precedent (from swap cache), and > experiments show that it works quite well to improve zswap's > performance. > > If the xarray patch proves to be even better, we can always combine it > with this approach (a per-range xarray?), or replace it with the > xarray. But for now: > > Acked-by: Nhat Pham <nphamcs@xxxxxxxxx> > Right, I agree. We should combine both approaches.