Re: [PATCH] mm, memcg: show memcg min setting in oom messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 22, 2019 at 6:28 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Wed 20-11-19 20:23:54, Yafang Shao wrote:
> > On Wed, Nov 20, 2019 at 7:40 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > >
> > > On Wed 20-11-19 18:53:44, Yafang Shao wrote:
> > > > On Wed, Nov 20, 2019 at 6:22 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed 20-11-19 03:53:05, Yafang Shao wrote:
> > > > > > A task running in a memcg may OOM because of the memory.min settings of his
> > > > > > slibing and parent. If this happens, the current oom messages can't show
> > > > > > why file page cache can't be reclaimed.
> > > > >
> > > > > min limit is not the only way to protect memory from being reclaim. The
> > > > > memory might be pinned or unreclaimable for other reasons (e.g. swap
> > > > > quota exceeded for memcg).
> > > >
> > > > Both swap or unreclaimabed (unevicteable) is printed in OOM messages.
> > >
> > > Not really. Consider a memcg which has reached it's swap limit. The
> > > anonymous memory is not really reclaimable even when there is a lot of
> > > swap space available.
> > >
> >
> > The memcg swap limit is already printed in oom messages, see bellow,
> >
> > [  141.721625] memory: usage 1228800kB, limit 1228800kB, failcnt 18337
> > [  141.721958] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
>
> But you do not have any insight on the swap limit down the oom
> hierarchy, do you?
>
> > > > Why not just print the memcgs which are under memory.min protection or
> > > > something like a total number of min protected memory ?
> > >
> > > Yes, this would likely help. But the main question really reamains, is
> > > this really worth it?
> > >
> >
> > If it doesn't cost too much, I think it is worth to do it.
> > As the oom path is not the critical path, so adding some print info
> > should not add much overhead.
>
> Generating a lot of output for the oom reports has been a real problem
> in many deployments.

So why not only print non-zero counters ?
If some counters are 0, we don't print them, that can reduce the oom reports.

Something like "isolated_file:0 unevictable:0 dirty:0 writeback:0
unstable:0" can all be removed,
and we consider them as zero by default.
I mean we can optimze the OOM reports and only print the useful
information to make it not be a problem in many deployments.

> [...]
> > > > I have said in the commit log, that we don't know why the file cache
> > > > can't be reclaimed (when evictable is 0 and dirty is 0 as well.)
> > >
> > > And the counter argument is that this will not help you there much in
> > > many large and much more common cases.
> > >
> > > I argue, and I might be wrong here so feel free to correct me, that the
> > > reclaim protection guarantee (min) is something to be under admins
> > > control. It shouldn't really happen nilly-willy because it has really
> > > large consequences, the OOM including. So if there is a suspicious
> > > amount of memory that could be reclaimed normally then the reclaim
> > > protection is really the first suspect to go after.
> > > --
> >
> > I don't know whether it happens nilly-willy or not.
>
> It is a reclaim protection guarantee (so essentially an mlock like
> thing) so it better have to be properly considered when used.
>
> > But if we all know that it may cause OOMs and it don't take too much
> > effort to show it in the OOM messages,
>
> I do not think we are in agreement here. As mentioned above the oom
> report is quite heavy already. So it should be other way around. There
> should be a strong reason to add something more. A real use case where
> not having that information is making debugging ooms considerably much
> harder.
>
> --
> Michal Hocko
> SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux