Re: [PATCH v2 1/2] memcg: flush stats only if updated

Shakeel Butt <shakeelb@xxxxxxxxxx> · Thu, 14 Oct 2021 09:31:46 -0700

Hi Michal,

On Wed, Oct 13, 2021 at 11:01 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> On Fri, Oct 01, 2021 at 12:00:39PM -0700, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> > In this patch we kept the stats update codepath very minimal and let the
> > stats reader side to flush the stats only when the updates are over a
> > specific threshold.  For now the threshold is (nr_cpus * CHARGE_BATCH).
>
> BTW, a noob question -- are the updates always single page sized?
>
> This is motivated by apples vs oranges comparison since the
>         nr_cpus * MEMCG_CHARGE_BATCH
> suggests what could the expected error be in pages (bytes). But it's mostly
> wrong since: a) uncertain single-page updates, b) various counter
> updates summed together. I wonder whether the formula can serve to
> provide at least some (upper) estimate.
>

Thanks for your review. This forces me to think more on this because each
update does not necessarily be a single page sized update e.g. adding a hugepage
to an LRU.

Though I think the error is time bounded by 2 seconds but in those 2 seconds
mathematically the error can be large. What do you think of the following
change? It will bound the error better within the 2 seconds window.



>From e87a36eedd02b0d10d8f66f83833bd6e2bae17b8 Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeelb@xxxxxxxxxx>
Date: Thu, 14 Oct 2021 08:49:06 -0700
Subject: [PATCH] Better bounds on the stats error

---
 mm/memcontrol.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8f1d9c028897..e5d5c850a521 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -626,14 +626,20 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
 static void flush_memcg_stats_dwork(struct work_struct *w);
 static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork);
 static DEFINE_SPINLOCK(stats_flush_lock);
-static DEFINE_PER_CPU(unsigned int, stats_updates);
+static DEFINE_PER_CPU(int, stats_diff);
 static atomic_t stats_flush_threshold = ATOMIC_INIT(0);
 
-static inline void memcg_rstat_updated(struct mem_cgroup *memcg)
+static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
 {
+	unsigned int x;
+
 	cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
-	if (!(__this_cpu_inc_return(stats_updates) % MEMCG_CHARGE_BATCH))
-		atomic_inc(&stats_flush_threshold);
+
+	x = abs(__this_cpu_add_return(stats_diff, val));
+	if (x > MEMCG_CHARGE_BATCH) {
+		atomic_add(x / MEMCG_CHARGE_BATCH, &stats_flush_threshold);
+		__this_cpu_write(stats_diff, 0);
+	}
 }
 
 static void __mem_cgroup_flush_stats(void)
@@ -672,7 +678,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
 		return;
 
 	__this_cpu_add(memcg->vmstats_percpu->state[idx], val);
-	memcg_rstat_updated(memcg);
+	memcg_rstat_updated(memcg, val);
 }
 
 /* idx can be of type enum memcg_stat_item or node_stat_item. */
@@ -705,7 +711,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	/* Update lruvec */
 	__this_cpu_add(pn->lruvec_stats_percpu->state[idx], val);
 
-	memcg_rstat_updated(memcg);
+	memcg_rstat_updated(memcg, val);
 }
 
 /**
@@ -807,7 +813,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
 		return;
 
 	__this_cpu_add(memcg->vmstats_percpu->events[idx], count);
-	memcg_rstat_updated(memcg);
+	memcg_rstat_updated(memcg, val);
 }
 
 static unsigned long memcg_events(struct mem_cgroup *memcg, int event)
-- 
2.33.0.882.g93a45727a2-goog