Re: Possible regression with cgroups in 3.11

Michal Hocko <mhocko@xxxxxxx> · Tue, 26 Nov 2013 16:21:24 +0100

On Mon 25-11-13 15:03:50, Markus Blank-Burian wrote:
> > Maybe it is stuck on some other blocking operation (you've said you have
> > the fix for too many workers applied, right?)
> >
> 
> For the last trace, I had not applied the cgroup work queue patch.

OK, that makes more sense now. The worker was probably hanging on
lru_add_drain_all waiting for its per-cpu workers or something like that.

> I just made some new traces with the applied patch, same problem. Now
> there is only the one unmatched "going offline" from the thread which
> actually gets stuck in "reparent charges".

OK, this would suggest that some charges were accounted to a different
group than the corresponding pages group's LRUs or that the charge cache (stock)
is b0rked (the later can be checked easily by making refill_stock a noop
- see the patch below - I am skeptical that would help though).

Let's rule out some usual suspects while I am staring at the
code. Are the tasks migrated between groups? What is the value of
memory.move_charge_at_immigrate?  Have you seen any memcg oom messages
in the log?

---

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index afe7c84d823f..de8375463d59 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2455,14 +2455,7 @@ static void __init memcg_stock_init(void)
  */
 static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
-
-	if (stock->cached != memcg) { /* reset if necessary */
-		drain_stock(stock);
-		stock->cached = memcg;
-	}
-	stock->nr_pages += nr_pages;
-	put_cpu_var(memcg_stock);
+	return;
 }
 
 /*
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html