On Thu, Nov 30, 2023 at 03:43:27PM -0500, Waiman Long wrote: > The rstat_cpu and also rstat_css_list of the cgroup structure are read > mostly variables. However, they may share the same cacheline as the > subsequent rstat_flush_next and *bstat variables which can be updated > frequently. That will slow down the cgroup_rstat_cpu() call which is > called pretty frequently in the rstat code. Add a CACHELINE_PADDING() > line in between them to avoid false cacheline sharing. > > A parallel kernel build on a 2-socket x86-64 server is used as the > benchmarking tool for measuring the lock hold time. Below were the lock > hold time frequency distribution before and after the patch: > > Run time Before patch After patch > -------- ------------ ----------- > 0-01 us 9,928,562 9,820,428 > 01-05 us 110,151 50,935 > 05-10 us 270 93 > 10-15 us 273 146 > 15-20 us 135 76 > 20-25 us 0 2 > 25-30 us 1 0 > > It can be seen that the patch further pushes the lock hold time towards > the lower end. > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> Applied to cgroup/for-6.8. Thanks. -- tejun