On Mon 25-11-13 15:03:50, Markus Blank-Burian wrote: > > Maybe it is stuck on some other blocking operation (you've said you have > > the fix for too many workers applied, right?) > > > > For the last trace, I had not applied the cgroup work queue patch. OK, that makes more sense now. The worker was probably hanging on lru_add_drain_all waiting for its per-cpu workers or something like that. > I just made some new traces with the applied patch, same problem. Now > there is only the one unmatched "going offline" from the thread which > actually gets stuck in "reparent charges". OK, this would suggest that some charges were accounted to a different group than the corresponding pages group's LRUs or that the charge cache (stock) is b0rked (the later can be checked easily by making refill_stock a noop - see the patch below - I am skeptical that would help though). Let's rule out some usual suspects while I am staring at the code. Are the tasks migrated between groups? What is the value of memory.move_charge_at_immigrate? Have you seen any memcg oom messages in the log? --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index afe7c84d823f..de8375463d59 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2455,14 +2455,7 @@ static void __init memcg_stock_init(void) */ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { - struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock); - - if (stock->cached != memcg) { /* reset if necessary */ - drain_stock(stock); - stock->cached = memcg; - } - stock->nr_pages += nr_pages; - put_cpu_var(memcg_stock); + return; } /* -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html