Re: [PATCH v3 0/8] Change OOM killer to use list of mm_struct.

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Mon, 25 Jul 2016 23:02:35 +0900

Michal Hocko wrote:
> On Mon 25-07-16 20:47:03, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Mon 25-07-16 20:07:11, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > > Are you planning to change the scope where the OOM victims can access memory
> > > > > > reserves?
> > > > > 
> > > > > Yes. Because we know that there are some post exit_mm allocations and I
> > > > > do not want to get back to PF_EXITING and other tricks...
> > > > > 
> > > > > > (1) If you plan to allow the OOM victims to access memory reserves until
> > > > > >     TASK_DEAD, tsk_is_oom_victim() will be as trivial as
> > > > > > 
> > > > > > bool tsk_is_oom_victim(struct task_struct *task)
> > > > > > {
> > > > > > 	return task->signal->oom_mm;
> > > > > > }
> > > > > 
> > > > > yes, exactly. That's what I've tried to say above. with the oom_mm this
> > > > > is trivial to implement while mm lists will not help us much due to
> > > > > their life time. This also means that we know about the oom victim until
> > > > > it is unhashed and become invisible to the oom killer.
> > > > 
> > > > Then, what are advantages with allowing only OOM victims access to memory
> > > > reserves after they left exit_mm()?
> > > 
> > > Because they might need it in order to move on... Say you want to close
> > > all the files which might release considerable amount of memory or any
> > > other post exit_mm() resources.
> > 
> > OOM victims might need memory reserves in order to move on, but non OOM victims
> > might also need memory reserves in order to move on. And non OOM victims might
> > be blocking OOM victims via locks.
> 
> Yes that might be true but OOM situations are rare events and quite
> reduced in the scope. Considering all exiting tasks is more dangerous
> because they might deplete those memory reserves easily.

Why do you assume that we grant all of memory reserves?
I'm suggesting that we grant portion of memory reserves.
Killed/exiting tasks cannot deplete memory reserves.

> 
> > > > Since we assume that mm_struct is the primary source of memory consumption,
> > > > we don't select threads which already left exit_mm(). Since we assume that
> > > > mm_struct is the primary source of memory consumption, why should we
> > > > distinguish OOM victims and non OOM victims after they left exit_mm()?
> > > 
> > > Because we might prevent from pointless OOM killer selection that way.
> > 
> > That "might" sounds obscure to me.
> > 
> > If currently allocating task is not an OOM victim then not giving it
> > access to memory reserves will cause OOM victim selection.
> 
> Sure, that is true. I am talking about the case where the current victim
> tries to get out and exit and it needs a memory for that.
> 
> > We might prevent from pointless OOM victim selection by giving
> > killed/exiting tasks access to memory reserves.
> 
> This will open risks for other problems, I am afraid. Please note that
> we are only trying to reduce the damage as much as possible. There is no
> 100% correct thing to do.

My suggestion (allowing only portion of memory reserves) includes that
memory allocations done by killed/exiting tasks do not give up. That is,
try to guarantee that memory allocations for commit/cleanup operations
do not fail due to use of ALLOC_NO_WATERMARKS, for there is no means for
killed/exiting tasks to handle problems caused by memory allocation
failures.

> 
> > > If we know that the currently allocating task is an OOM victim then
> > > giving it access to memory reserves is preferable to selecting another
> > > oom victim.
> > 
> > If we know that the currently allocating task is killed/exiting then
> > giving it access to memory reserves is preferable to selecting another
> > OOM victim.
> 
> I believe this is getting getting off topic. Can we get back to mm list
> vs signal::oom_mm decision? I have expressed one aspect that would speak
> for oom_mm as it provides a persistent and easy to detect oom victim
> which would be tricky with the mm list approach. Could you name some
> arguments which would speak for the mm list and would be a problem with
> the other approach?

I thought we are talking about future plan. I didn't know you are asking for
some arguments which would speak for the mm list.

Since the mm list approach turned out that we after all need victim's
task_struct in order to test eligibility of victim's mm, the signal::oom_mm
approach will be easier to access both victim's task_struct and victim's mm
than the mm list approach. I'm fine with signal::oom_mm approach regarding
oom_scan_process_thread() part.

But I don't like use of ALLOC_NO_WATERMARKS by signal::oom_mm != NULL tasks
after they passed exit_mm(). Such behavior may cause post-exit_mm() allocation
requests which might be doing commit/cleanup operations to start failing. I'm
trying to reduce the damage as much as possible by not giving up memory
allocations by OOM victims or by killed/exiting tasks (unless __GFP_KILLABLE
is used and killed by SIGKILL). My approach will select next OOM victim when
killed/exiting tasks cannot satisfy their allocation requests even if some
portion of memory reserves are granted because my approach does not use
ALLOC_NO_WATERMARKS.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>