From: Kairui Song <kasong@xxxxxxxxxxx> This series removes the global swap cgroup lock. The critical section of this lock is very short but it's still a bottle neck for mass parallel swap workloads. Up to 10% performance gain for tmpfs build kernel test on a 48c96t system under memory pressure, and no regression for other cases: V2: https://lore.kernel.org/linux-mm/20241210092805.87281-1-ryncsn@xxxxxxxxx/ Updates since V2: - Micro optimization for bit operations in patch 3 [Chris Li] - Improve BUILD_BUG_ON to cover potential arch corner cases [Chris Li] - Introduce patch 4, make the swap_cgroup tracking code more robust [Chris Li] V1: https://lore.kernel.org/linux-mm/20241202184154.19321-1-ryncsn@xxxxxxxxx/ Updates since V1: - Collect Review and Ack. - Use bit shift instead of a mixed usage of short and atomic for emulating 2 byte xchg [Chris Li] - Merge patch 3 into patch 4 for simplicity [Roman Gushchin]. - Drop call of mem_cgroup_disabled instead in patch 1, also fix bot build error [Yosry Ahmed] - Wrap the access of the atomic_t map with helpers properly, so the emulation can be dropped to use native 2 byte xchg once available. Kairui Song (4): mm, memcontrol: avoid duplicated memcg enable check mm/swap_cgroup: remove swap_cgroup_cmpxchg mm/swap_cgroup: remove global swap cgroup lock mm/swap_cgroup: decouple swap cgroup recording and clearing include/linux/swap_cgroup.h | 14 ++-- mm/memcontrol.c | 15 ++-- mm/swap_cgroup.c | 148 +++++++++++++++++++----------------- 3 files changed, 93 insertions(+), 84 deletions(-) -- 2.47.1