Re: [PATCH 2/3] oom, oom_reaper: Try to reap tasks which skip regular OOM killer path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 11-04-16 22:26:09, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Sat 09-04-16 13:39:30, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Fri 08-04-16 20:19:28, Tetsuo Handa wrote:
> > > > > I looked at next-20160408 but I again came to think that we should remove
> > > > > these shortcuts (something like a patch shown bottom).
> > > >
> > > > feel free to send the patch with the full description. But I would
> > > > really encourage you to check the history to learn why those have been
> > > > added and describe why those concerns are not valid/important anymore.
> > > 
> > > I believe that past discussions and decisions about current code are too
> > > optimistic because they did not take 'The "too small to fail" memory-
> > > allocation rule' problem into account.
> > 
> > In most cases they were driven by _real_ usecases though. And that
> > is what matters. Theoretically possible issues which happen under
> > crazy workloads which are DoSing the machine already are not something
> > to optimize for. Sure we should try to cope with them as gracefully
> > as possible, no questions about that, but we should try hard not to
> > reintroduce previous issues during _sensible_ workloads.
> 
> I'm not requesting you to optimize for crazy workloads. None of my
> customers intentionally put crazy workloads, but they are getting silent
> hangups and I'm suspecting that something went wrong with memory management.

There are many other possible reasons for thses symptoms. Have you
actually seen any _evidence_ they the hang they are seeing is due to
oom deadlock, though. A single crash dump or consistent sysrq output
which would point that direction.

> But there is no evidence because memory management subsystem remains silent.
> You call my testcases DoS, but there is no evidence that my customers
> are not hitting the same problem my testcases found.

This is really impossible to comment on.

> I'm suggesting you to at least emit diagnostic messages when something went
> wrong. That is what kmallocwd is for. And if you do not want to emit
> diagnostic messages, I'm fine with timeout based approach.

I am all for more diagnostic but what you were proposing was so heavy
weight it doesn't really seem worth it.

Anyway yet again this is getting largely off-topic...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]