Re: Possible regression with cgroups in 3.11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 08, 2013 at 04:36:34PM +0800, Li Zefan wrote:
> >>>>      kworker/3:5-7605  [003] ....   987.475678:
> >>>> mem_cgroup_reparent_charges: usage: 1568768
> >>>>      kworker/3:5-7605  [003] ....   987.478677:
> >>>> mem_cgroup_reparent_charges: usage: 1568768
> >>>>      kworker/3:5-7605  [003] ....   987.481675:
> >>>> mem_cgroup_reparent_charges: usage: 1568768
> >>>
> >>> So it's much more likely this is a memcg bug rather than a cgroup bug.
> >>> I hope memcg guys could look into it, or you could do a git-bisect if
> >>> you can reliably reproduce the bug.
> >>
> >> I think there is a problem with ref counting and memcg.
> >>
> >> The old scheme would wait with the charge reparenting until all
> >> references were gone for good, whereas the new scheme has only a RCU
> >> grace period between disabling tryget and offlining the css.
> >> Unfortunately, memory cgroups don't hold the rcu_read_lock() over both
> >> the tryget and the res_counter charge that would make it visible to
> >> offline_css(), which means that there is a possible race condition
> >> between cgroup teardown and an ongoing charge:
> >>
> >> #0: destroy                #1: charge
> >>
> >>                            rcu_read_lock()
> >>                            css_tryget()
> >>                            rcu_read_unlock()
> >> disable tryget()
> >> call_rcu()
> >>   offline_css()
> >>     reparent_charges()
> >>                            res_counter_charge()
> >>                            css_put()
> >>                              css_free()
> >>                            pc->mem_cgroup = deadcg
> >>                            add page to lru
> >>
> >> If the res_counter is hierarchical, there is now a leaked charge from
> >> the dead group in the parent counter with no corresponding page on the
> >> LRU, which will lead to this endless loop when deleting the parent.
> > 
> > Hugh, I bet your problem is actually the same thing, where
> > reparent_charges is looping and the workqueue is not actually stuck,
> > it's just that waiting for the CPU callbacks is the longest thing the
> > loop does so it's most likely to show up in the trace.
> > 
> >> The race window can be seconds if the res_counter hits its limit and
> >> page reclaim is entered between css_tryget() and the res counter
> >> charge succeeding.
> >>
> >> I thought about calling reparent_charges() again from css_free() at
> >> first to catch any raced charges.  But that won't work if the last
> >> reference is actually put by the charger because then it'll drop into
> >> the loop before putting the page on the LRU.
> > 
> > Actually, it *should* work after all, because the final css_put()
> > schedules css_free() in a workqueue, which will just block until the
> > charge finishes and lru-links the page.  Not the most elegant
> > behavior, but hey, neither is livelocking!
> > 
> > So how about this?
> > 
> 
> Thanks for the analysis and fix!
> 
> But we fix the bug by adding a call to mem_cgroup_reparent_charge() in
> css_free() while we are dead-looping in css_offline()? We won't call
> css_free() if css_offline() hasn't finished.

The css_free() reparenting is supposed to catch racing charges in the
child so that later on the css_offline() in the parent can find them.

> > From: Johannes Weiner <hannes@xxxxxxxxxxx>
> > Subject: [patch] mm: memcg: reparent charges during css_free()
> > 
> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> > Cc: stable@xxxxxxxxxx # 3.8+
> 
> FYI, I don't think Markus and google ever experience this issue
> before 3.11.

3.8 is when the cgroup destruction protocol changed.  But as per
Markus's email, his issue is actually not fixed by this, so back to
staring at the change history...
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux