Re: [PATCH v2] cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/2/23 00:35, Yosry Ahmed wrote:
On Wed, Nov 1, 2023 at 5:53 PM Waiman Long <longman@xxxxxxxxxx> wrote:
When cgroup_rstat_updated() isn't being called concurrently with
cgroup_rstat_flush_locked(), its run time is pretty short. When
both are called concurrently, the cgroup_rstat_updated() run time
can spike to a pretty high value due to high cpu_lock hold time in
cgroup_rstat_flush_locked(). This can be problematic if the task calling
cgroup_rstat_updated() is a realtime task running on an isolated CPU
with a strict latency requirement. The cgroup_rstat_updated() call can
happens when there is a page fault even though the task is running in
s/happens/happen

user space most of the time.

The percpu cpu_lock is used to protect the update tree -
updated_next and updated_children. This protection is only needed
when cgroup_rstat_cpu_pop_updated() is being called. The subsequent
flushing operation which can take a much longer time does not need
that protection.
nit: add: as it is already protected by cgroup_rstat_lock.

To reduce the cpu_lock hold time, we need to perform all the
cgroup_rstat_cpu_pop_updated() calls up front with the lock
released afterward before doing any flushing. This patch adds a new
cgroup_rstat_updated_list() function to return a singly linked list of
cgroups to be flushed.

By adding some instrumentation code to measure the maximum elapsed times
of the new cgroup_rstat_updated_list() function and each cpu iteration of
cgroup_rstat_updated_locked() around the old cpu_lock lock/unlock pair
on a 2-socket x86-64 server running parallel kernel build, the maximum
elapsed times are 27us and 88us respectively. The maximum cpu_lock hold
time is now reduced to about 30% of the original.

Below were the run time distribution of cgroup_rstat_updated_list()
during the same period:

       Run time             Count
       --------             -----
          t <= 1us       12,574,302
    1us < t <= 5us        2,127,482
    5us < t <= 10us           8,445
   10us < t <= 20us           6,425
   20us < t <= 30us              50

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
LGTM with some nits.

Reviewed-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>

---
  include/linux/cgroup-defs.h |  6 +++++
  kernel/cgroup/rstat.c       | 45 ++++++++++++++++++++++++-------------
  2 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 265da00a1a8b..daaf6d4eb8b6 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -491,6 +491,12 @@ struct cgroup {
         struct cgroup_rstat_cpu __percpu *rstat_cpu;
         struct list_head rstat_css_list;

+       /*
+        * A singly-linked list of cgroup structures to be rstat flushed.
+        * Protected by cgroup_rstat_lock.
Do you think we should mention that this is a scratch area for
cgroup_rstat_flush_locked()? IOW, this field will be invalid or may
contain garbage otherwise.
I can certainly add that into the comment.

It might be also useful to mention that the scope of usage for this is
for each percpu flushing iteration. The cgroup_rstat_lock can be
dropped between percpu flushing iterations, so different flushers can
reuse this field safely because it is re-initialized in every
iteration and only used there.

+        */
+       struct cgroup   *rstat_flush_next;
+
         /* cgroup basic resource statistics */
         struct cgroup_base_stat last_bstat;
         struct cgroup_base_stat bstat;
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index d80d7a608141..a86d40ed8bda 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -145,6 +145,34 @@ static struct cgroup *cgroup_rstat_cpu_pop_updated(struct cgroup *pos,
         return pos;
  }

+/*
+ * Return a list of updated cgroups to be flushed
+ */
Why not just on a single line?
/* Return a list of updated cgroups to be flushed */

Yes, it can be compressed into a one liner.

Thanks for the review and suggestion.

Cheers,
Longman





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux