From: Kairui Song <kasong@xxxxxxxxxxx> This series removes the global swap cgroup lock. The critical section of this lock is very short but it's still a bottle neck for mass parallel swap workloads. Up to 10% performance gain for tmpfs build kernel test on a 48c96t system, and no regression for other cases: Testing using 64G brd and build with build kernel with make -j96 in 1.5G memory cgroup using 4k folios showed below improvement (10 test run): Before this series: Sys time: 10809.46 (stdev 80.831491) Real time: 171.41 (stdev 1.239894) After this commit: Sys time: 9621.26 (stdev 34.620000), -10.42% Real time: 160.00 (stdev 0.497814), -6.57% With 64k folios and 2G memcg: Before this series: Sys time: 8231.99 (stdev 30.030994) Real time: 143.57 (stdev 0.577394) After this commit: Sys time: 7403.47 (stdev 6.270000), -10.06% Real time: 135.18 (stdev 0.605000), -5.84% Sequential swapout of 8G 64k zero folios (24 test run): Before this series: 5461409.12 us (stdev 183957.827084) After this commit: 5420447.26 us (stdev 196419.240317) Sequential swapin of 8G 4k zero folios (24 test run): Before this series: 19736958.916667 us (stdev 189027.246676) After this commit: 19662182.629630 us (stdev 172717.640614) V1: https://lore.kernel.org/linux-mm/20241202184154.19321-1-ryncsn@xxxxxxxxx/ Updates: - Collect Review and Ack. - Use bit shift instead of a mixed usage of short and atomic for emulating 2 byte xchg [Chris Li] - Merge patch 3 into patch 4 for simplicity [Roman Gushchin]. - Drop call of mem_cgroup_disabled instead in patch 1, also fix bot build error [Yosry Ahmed] - Wrap the access of the atomic_t map with helpers properly, so the emulation can be dropped to use native 2 byte xchg once available. Kairui Song (3): mm, memcontrol: avoid duplicated memcg enable check mm/swap_cgroup: remove swap_cgroup_cmpxchg mm, swap_cgroup: remove global swap cgroup lock include/linux/swap_cgroup.h | 2 - mm/memcontrol.c | 2 +- mm/swap_cgroup.c | 96 ++++++++++++++++--------------------- 3 files changed, 43 insertions(+), 57 deletions(-) -- 2.47.1