On Fri, Nov 03, 2023 at 11:13:01PM -0400, Waiman Long wrote: > When cgroup_rstat_updated() isn't being called concurrently with > cgroup_rstat_flush_locked(), its run time is pretty short. When > both are called concurrently, the cgroup_rstat_updated() run time > can spike to a pretty high value due to high cpu_lock hold time in > cgroup_rstat_flush_locked(). This can be problematic if the task calling > cgroup_rstat_updated() is a realtime task running on an isolated CPU > with a strict latency requirement. The cgroup_rstat_updated() call can > happen when there is a page fault even though the task is running in > user space most of the time. > > The percpu cpu_lock is used to protect the update tree - > updated_next and updated_children. This protection is only needed when > cgroup_rstat_cpu_pop_updated() is being called. The subsequent flushing > operation which can take a much longer time does not need that protection > as it is already protected by cgroup_rstat_lock. > > To reduce the cpu_lock hold time, we need to perform all the > cgroup_rstat_cpu_pop_updated() calls up front with the lock > released afterward before doing any flushing. This patch adds a new > cgroup_rstat_updated_list() function to return a singly linked list of > cgroups to be flushed. > > Some instrumentation code are added to measure the cpu_lock hold time > right after lock acquisition to after releasing the lock. Parallel > kernel build on a 2-socket x86-64 server is used as the benchmarking > tool for measuring the lock hold time. > > The maximum cpu_lock hold time before and after the patch are 100us and > 29us respectively. So the worst case time is reduced to about 30% of > the original. However, there may be some OS or hardware noises like NMI > or SMI in the test system that can worsen the worst case value. Those > noises are usually tuned out in a real production environment to get > a better result. > > OTOH, the lock hold time frequency distribution should give a better > idea of the performance benefit of the patch. Below were the frequency > distribution before and after the patch: > > Hold time Before patch After patch > --------- ------------ ----------- > 0-01 us 804,139 13,738,708 > 01-05 us 9,772,767 1,177,194 > 05-10 us 4,595,028 4,984 > 10-15 us 303,481 3,562 > 15-20 us 78,971 1,314 > 20-25 us 24,583 18 > 25-30 us 6,908 12 > 30-40 us 8,015 > 40-50 us 2,192 > 50-60 us 316 > 60-70 us 43 > 70-80 us 7 > 80-90 us 2 > >90 us 3 > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > Reviewed-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> Applied this one to cgroup/for-6.8. Will wait for the updated version for the other patches. Thanks. -- tejun