Re: Possible regression with cgroups in 3.11

Michal Hocko <mhocko@xxxxxxx> · Tue, 12 Nov 2013 15:58:24 +0100

On Mon 11-11-13 16:31:48, Michal Hocko wrote:
> On Thu 07-11-13 18:53:01, Johannes Weiner wrote:
> [...]
> > Unfortunately, memory cgroups don't hold the rcu_read_lock() over both
> > the tryget and the res_counter charge that would make it visible to
> > offline_css(), which means that there is a possible race condition
> > between cgroup teardown and an ongoing charge:
> > 
> > #0: destroy                #1: charge
> > 
> >                            rcu_read_lock()
> >                            css_tryget()
> >                            rcu_read_unlock()
> > disable tryget()
> > call_rcu()
> >   offline_css()
> >     reparent_charges()
> >                            res_counter_charge()
> >                            css_put()
> >                              css_free()
> >                            pc->mem_cgroup = deadcg
> >                            add page to lru
> 
> AFAICS this might only happen when a charge is done by a group nonmember
> (the group has to be empty at the destruction path, right?) and this is
> done only for the swap accounting. What we can do instead is that we
> might recheck after the charge has been done and cancel + fallback to
> current if we see that the group went away. Not nice either but it
> shouldn't be intrusive and should work:
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d1fded477ef6..3d69a3fe4c55 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4006,6 +4006,24 @@ static int __mem_cgroup_try_charge_swapin(struct mm_struct *mm,
>  		goto charge_cur_mm;
>  	*memcgp = memcg;
>  	ret = __mem_cgroup_try_charge(NULL, mask, 1, memcgp, true);
> +
> +	/*
> +	 * try_get_mem_cgroup_from_page and the actual charge are not
> +	 * done in the same RCU read section which means that the memcg
> +	 * might get offlined before res counter is charged so we have
> +	 * to recheck the memcg status again here and revert that charge
> +	 * as we cannot be sure it was accounted properly.
> +	 */
> +	if (!ret) {
> +		/* XXX: can we have something like css_online() check? */

Hmm, there is CSS_ONLINE and it has a different meaning that we would
need.
css_alive() would be a better fit. Something like the following:

static inline bool percpu_ref_alive(struct percpu_ref *ref)
{
	int ret = false;

	rcu_lockdep_assert(rcu_read_lock_held(), "percpu_ref_alive needs an RCU lock");
	return REF_STATUS(pcpu_count) == PCPU_REF_PTR;
}

static inline bool css_alive(struct cgroup_subsys_state *css)
{
	return percpu_ref_alive(&css->refcnt);
}

and for our use we would have something like the following in
__mem_cgroup_try_charge_swapin:

	memcg = try_get_mem_cgroup_from_page
	[...]
	__mem_cgroup_try_charge(...);

	/*
         * try_get_mem_cgroup_from_page and the actual charge are not
         * done in the same RCU read section which means that the memcg
         * might get offlined before res counter is charged if the
         * current charger is not a memcg memeber so we have to recheck
         * the memcg's css status again here and revert that charge as
         * we cannot be sure it was accounted properly.
	 */
	if (!ret) {
		rcu_read_lock();
		if (!css_alive(&memcg->css)) {
			__mem_cgroup_cancel_charge(memcg, 1);
			rcu_read_unlock();
			css_put(&memcg->css);
			*memcgp = NULL;
			goto charge_cur_mm;
		}
		rcu_read_unlock();
	}

> What do you think? I guess, in the long term we somehow have to mark
> alien charges and delay css_offlining until they are done to prevent
> hacks like above.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html