On Thu, Apr 22, 2021 at 11:47:05AM +0800, Muchun Song wrote: > On Thu, Apr 22, 2021 at 8:57 AM Roman Gushchin <guro@xxxxxx> wrote: > > > > On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote: > > > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > > > On Wed 21-04-21 17:50:06, Muchun Song wrote: > > > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > > > > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote: > > > > > > > The below scenario can cause the page counters of the root_mem_cgroup > > > > > > > to be out of balance. > > > > > > > > > > > > > > CPU0: CPU1: > > > > > > > > > > > > > > objcg = get_obj_cgroup_from_current() > > > > > > > obj_cgroup_charge_pages(objcg) > > > > > > > memcg_reparent_objcgs() > > > > > > > // reparent to root_mem_cgroup > > > > > > > WRITE_ONCE(iter->memcg, parent) > > > > > > > // memcg == root_mem_cgroup > > > > > > > memcg = get_mem_cgroup_from_objcg(objcg) > > > > > > > // do not charge to the root_mem_cgroup > > > > > > > try_charge(memcg) > > > > > > > > > > > > > > obj_cgroup_uncharge_pages(objcg) > > > > > > > memcg = get_mem_cgroup_from_objcg(objcg) > > > > > > > // uncharge from the root_mem_cgroup > > > > > > > page_counter_uncharge(&memcg->memory) > > > > > > > > > > > > > > This can cause the page counter to be less than the actual value, > > > > > > > Although we do not display the value (mem_cgroup_usage) so there > > > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in > > > > > > > the page_counter_cancel(). Who knows if it will trigger? So it > > > > > > > is better to fix it. > > > > > > > > > > > > The changelog doesn't explain the fix and why you have chosen to charge > > > > > > kmem objects to root memcg and left all other try_charge users intact. > > > > > > > > > > The object cgroup is special (because the page can reparent). Only the > > > > > user of objcg APIs should be fixed. > > > > > > > > > > > The reason is likely that those are not reparented now but that just > > > > > > adds an inconsistency. > > > > > > > > > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages > > > > > > to check for the root memcg and bail out early? > > > > > > > > > > Because obj_cgroup_uncharge_pages() uncharges pages from the > > > > > root memcg unconditionally. Why? Because some pages can be > > > > > reparented to root memcg, in order to ensure the correctness of > > > > > page counter of root memcg. We have to uncharge pages from > > > > > root memcg. So we do not check whether the page belongs to > > > > > the root memcg when it uncharges. > > > > > > > > I am not sure I follow. Let me ask differently. Wouldn't you > > > > achieve the same if you simply didn't uncharge root memcg in > > > > obj_cgroup_charge_pages? > > > > > > I'm afraid not. Some pages should uncharge root memcg, some > > > pages should not uncharge root memcg. But all those pages belong > > > to the root memcg. We cannot distinguish between the two. > > > > > > I believe Roman is very familiar with this mechanism (objcg APIs). > > > > > > Hi Roman, > > > > > > Any thoughts on this? > > > > First, unfortunately we do export the root's counter on cgroup v1: > > /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes > > But we don't ignore these counters for the root mem cgroup, so there > > are no bugs here. (Otherwise, please, reproduce it). So it's all about > > the potential warning in page_counter_cancel(). > > Right. > > > > > The patch looks technically correct to me. Not sure about __try_charge() > > naming, we never use "__" prefix to do something with the root_mem_cgroup. > > > > The commit message should be more clear and mention the following: > > get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg, > > so we never explicitly charge the root_mem_cgroup. And it's not > > going to change. > > It's all about a race when we got an obj_cgroup pointing at some non-root > > memcg, but before we were able to charge it, the cgroup was gone, objcg was > > reparented to the root and so we're skipping the charging. Then we store the > > objcg pointer and later use to uncharge the root_mem_cgroup. > > Very clear. Thanks. > > > > > But honestly I'm not sure the problem is worth the time spent on the fix > > and the discussion. It's a small race and it's generally hard to trigger > > a kernel allocation racing with a cgroup deletion and then you need *a lot* > > of such races and then maybe there will be a single warning printed without > > *any* other consequences. > > I agree the race is very small. Since the fix is easy, but a little confusing > to someone. I want to hear other people's suggestions on whether to fix it. I'm not opposing the idea to fix this issue. But, __please__, make sure you include all necessary information into the commit log. Thanks!