v4: - Update patch 2 to fix a minor bug and update some of the comments. v3: - Minor comment twisting as suggested by Yosry. - Add patches 2 and 3 to further reduce lock hold time The purpose of this patch series is to reduce of the cpu_lock hold time in cgroup_rstat_flush_locked() so as to reduce the latency impact when cgroup_rstat_updated() is called as they may contend with each other on the cpu_lock. A parallel kernel build on a 2-socket x86-64 server is used as the benchmarking tool for measuring the lock hold time. Below were the lock hold time frequency distribution before and after applying different number of patches: Hold time Before patch Patch 1 Patches 1-2 Patches 1-3 --------- ------------ ------- ----------- ----------- 0-01 us 804,139 13,738,708 14,594,545 15,484,707 01-05 us 9,772,767 1,177,194 439,926 207,382 05-10 us 4,595,028 4,984 5,960 3,174 10-15 us 303,481 3,562 3,543 3,006 15-20 us 78,971 1,314 1,397 1,066 20-25 us 24,583 18 25 15 25-30 us 6,908 12 12 10 30-40 us 8,015 40-50 us 2,192 50-60 us 316 60-70 us 43 70-80 us 7 80-90 us 2 >90 us 3 Waiman Long (3): cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked() cgroup/rstat: Optimize cgroup_rstat_updated_list() cgroup: Avoid false cacheline sharing of read mostly rstat_cpu include/linux/cgroup-defs.h | 14 ++++ kernel/cgroup/rstat.c | 131 +++++++++++++++++++++--------------- 2 files changed, 91 insertions(+), 54 deletions(-) -- 2.39.3