Re: [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 4, 2020 at 8:46 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Mon 04-05-20 20:34:01, Yafang Shao wrote:
> > On Mon, May 4, 2020 at 4:18 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > >
> > > [It would be really great if a newer version was posted only after there
> > > was a wider consensus on the approach.]
> > >
> > > On Mon 04-05-20 00:26:21, Yafang Shao wrote:
> > > > Recently Shakeel reported a issue which also confused me several months
> > > > earlier. Bellow is his report -
> > > > Lowering memory.max can trigger an oom-kill if the reclaim does not
> > > > succeed. However if oom-killer does not find a process for killing, it
> > > > dumps a lot of warnings.
> > > > Deleting a memcg does not reclaim memory from it and the memory can
> > > > linger till there is a memory pressure. One normal way to proactively
> > > > reclaim such memory is to set memory.max to 0 just before deleting the
> > > > memcg. However if some of the memcg's memory is pinned by others, this
> > > > operation can trigger an oom-kill without any process and thus can log a
> > > > lot of un-needed warnings. So, ignore all such warnings from memory.max.
> > > >
> > > > A better way to avoid this issue is to avoid trying to kill a process if
> > > > memcg is not populated.
> > > > Note that OOM is different from OOM kill. OOM is a status that the
> > > > system or memcg is out of memory, while OOM kill is a result that a
> > > > process inside this memcg is killed when this memcg is in OOM status.
> > >
> > > Agreed.
> > >
> > > > That is the same reason why there're both MEMCG_OOM event and
> > > > MEMCG_OOM_KILL event. If we have already known that there's nothing to
> > > > kill, i.e. the memcg is not populated, then we don't need a try.
> > >
> > > OK, but you are not explaining why a silent failure is really better
> > > than no oom report under oom situation. With your patch, there is
> > > no failure reported to the user and there is also no sign that there
> > > might be a problem that memcg leaves memory behind that is not bound to
> > > any (killable) process. This could be an important information.
> > >
> >
> > That is not a silent failure. An oom event will be reported.
> > The user can get this event by memory.events or memory.events.local if
> > he really care about it.
>
> You are right. The oom situation will be reported (somehow) but the
> reason why no task has been killed might be several and there is no way
> to report no eligible tasks.
>
> > Especially when the admin set memory.max to 0 to drop all the caches,
> > many oom logs are a noise, besides that there are some side effect,
> > for example two many oom logs printed to a slow console may cause some
> > latency spike.
>
> But the oom situation and the oom report is simply something an admin
> has to expect especially when the hard limit is set to 0. With kmem
> accounting there is no guarantee that the target will be met.

I'm always wondering that why not moving the kmem from this memcg to
the root_mem_cgroup in this situation ?
Then this memcg can be easily reclaimed.

> >
> >
> > > Besides that I really do not see any actual problem that this would be
> > > fixing.
> >
> > Avoid printing two many oom logs.
>
> There is only a single oom report printed so I disagree this is really a
> proper justification.
>
> Unless you can come up with a better justification I am against this
> patch. It unnecessarily reduce debugging tools while it doesn't really
> provide any huge advantage. Changing the hard limit to impossible target
> is known to trigger the oom kernel and the oom report is a part of that.
> If the oom report is too noisy then we can discuss on how to make it
> more compact but making ad-hoc exceptions like this one is not a good
> solution.
> --

No better justification yet. But I think more memcg users will
complaining about it.

-- 
Thanks
Yafang




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux