Re: [PATCH] mm,oom: Exclude TIF_MEMDIE processes from candidates.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michal Hocko wrote:
> On Fri 08-01-16 00:38:43, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > @@ -333,6 +333,14 @@ static struct task_struct *select_bad_process(struct oom_control *oc,
> > >  		if (points == chosen_points && thread_group_leader(chosen))
> > >  			continue;
> > >  
> > > +		/*
> > > +		 * If the current major task is already ooom killed and this
> > > +		 * is sysrq+f request then we rather choose somebody else
> > > +		 * because the current oom victim might be stuck.
> > > +		 */
> > > +		if (is_sysrq_oom(sc) && test_tsk_thread_flag(p, TIF_MEMDIE))
> > > +			continue;
> > > +
> > >  		chosen = p;
> > >  		chosen_points = points;
> > >  	}
> > 
> > Do we want to require SysRq-f for each thread in a process?
> > If g has 1024 p, dump_tasks() will do
> > 
> >   pr_info("[%5d] %5d %5d %8lu %8lu %7ld %7ld %8lu         %5hd %s\n",
> > 
> > for 1024 times? I think one SysRq-f per one process is sufficient.
> 
> I am not following you here. If we kill the process the whole process
> group (aka all threads) will get killed which ever thread we happen to
> send the sigkill to.

Please distinguish "sending SIGKILL to a process" and "all threads in that
process terminate". do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true)
sends SIGKILL to a victim process, but it does not guarantee that all
threads in that process terminate even if the OOM reaper reclaimed memory.
That's when SysRq-f (and timeout based next victim selection) is needed
but currently SysRq-f forever continues selecting incorrect process.

I can observe SysRq-f is disabled
(Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160112.txt.xz .)
----------
[   86.767482] a.out invoked oom-killer: order=0, oom_score_adj=0, gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|GFP_ZERO)
[   86.769905] a.out cpuset=/ mems_allowed=0
[   86.771393] CPU: 2 PID: 9573 Comm: a.out Not tainted 4.4.0-next-20160112+ #279
(...snipped...)
[   86.874710] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
(...snipped...)
[   86.945286] [ 9573]  1000  9573   541717   402522     796       6        0             0 a.out
[   86.947457] [ 9574]  1000  9574     1078       21       7       3        0             0 a.out
[   86.949568] Out of memory: Kill process 9573 (a.out) score 908 or sacrifice child
[   86.951538] Killed process 9574 (a.out) total-vm:4312kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[   86.955296] systemd-journal invoked oom-killer: order=0, oom_score_adj=0, gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|GFP_COLD)
[   86.958035] systemd-journal cpuset=/ mems_allowed=0
(...snipped...)
[   87.128808] [ 9573]  1000  9573   541717   402522     796       6        0             0 a.out
[   87.130926] [ 9575]  1000  9574     1078        0       7       3        0             0 a.out
[   87.133055] Out of memory: Kill process 9573 (a.out) score 908 or sacrifice child
[   87.134989] Killed process 9575 (a.out) total-vm:4312kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  116.979564] sysrq: SysRq : Manual OOM execution
[  116.984119] kworker/0:8 invoked oom-killer: order=-1, oom_score_adj=0, gfp_mask=0x24000c0(GFP_KERNEL)
[  116.986367] kworker/0:8 cpuset=/ mems_allowed=0
(...snipped...)
[  117.157045] [ 9573]  1000  9573   541717   402522     797       6        0             0 a.out
[  117.159191] [ 9575]  1000  9574     1078        0       7       3        0             0 a.out
[  117.161302] Out of memory: Kill process 9573 (a.out) score 908 or sacrifice child
[  117.163250] Killed process 9575 (a.out) total-vm:4312kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  119.043685] sysrq: SysRq : Manual OOM execution
[  119.046239] kworker/0:8 invoked oom-killer: order=-1, oom_score_adj=0, gfp_mask=0x24000c0(GFP_KERNEL)
[  119.048453] kworker/0:8 cpuset=/ mems_allowed=0
(...snipped...)
[  119.215982] [ 9573]  1000  9573   541717   402522     797       6        0             0 a.out
[  119.218122] [ 9575]  1000  9574     1078        0       7       3        0             0 a.out
[  119.220237] Out of memory: Kill process 9573 (a.out) score 908 or sacrifice child
[  119.222129] Killed process 9575 (a.out) total-vm:4312kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  120.179644] sysrq: SysRq : Manual OOM execution
[  120.206938] kworker/0:8 invoked oom-killer: order=-1, oom_score_adj=0, gfp_mask=0x24000c0(GFP_KERNEL)
[  120.209152] kworker/0:8 cpuset=/ mems_allowed=0
(...snipped...)
[  120.376821] [ 9573]  1000  9573   541717   402522     797       6        0             0 a.out
[  120.378924] [ 9575]  1000  9574     1078        0       7       3        0             0 a.out
[  120.381065] Out of memory: Kill process 9573 (a.out) score 908 or sacrifice child
[  120.382929] Killed process 9575 (a.out) total-vm:4312kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  121.235296] sysrq: SysRq : Manual OOM execution
[  121.252742] kworker/0:8 invoked oom-killer: order=-1, oom_score_adj=0, gfp_mask=0x24000c0(GFP_KERNEL)
[  121.254955] kworker/0:8 cpuset=/ mems_allowed=0
(...snipped...)
[  141.024984] a.out           D ffff88007c417948     0  9573   8117 0x00000080
[  141.026830]  ffff88007c417948 ffff880076cac2c0 ffff880076c442c0 ffff88007c418000
[  141.028789]  ffff88007c417980 ffff88007fc90240 00000000fffd7aa1 00000000000006bc
[  141.030746]  ffff88007c417960 ffffffff816fc1a7 ffff88007fc90240 ffff88007c417a08
[  141.032703] Call Trace:
[  141.033653]  [<ffffffff816fc1a7>] schedule+0x37/0x90
[  141.035056]  [<ffffffff81700567>] schedule_timeout+0x117/0x1c0
[  141.036629]  [<ffffffff810e1310>] ? init_timer_key+0x40/0x40
[  141.038182]  [<ffffffff81700694>] schedule_timeout_uninterruptible+0x24/0x30
[  141.039963]  [<ffffffff8114944b>] __alloc_pages_nodemask+0x91b/0xd90
[  141.041631]  [<ffffffff811925e6>] alloc_pages_vma+0xb6/0x290
[  141.043173]  [<ffffffff811711d0>] handle_mm_fault+0x1180/0x1630
[  141.044770]  [<ffffffff811700a4>] ? handle_mm_fault+0x54/0x1630
[  141.046355]  [<ffffffff8105a651>] __do_page_fault+0x1a1/0x440
[  141.047915]  [<ffffffff8105a920>] do_page_fault+0x30/0x80
[  141.049408]  [<ffffffff81702307>] ? native_iret+0x7/0x7
[  141.050876]  [<ffffffff817033e8>] page_fault+0x28/0x30
[  141.052327]  [<ffffffff813a6f3d>] ? __clear_user+0x3d/0x70
[  141.053831]  [<ffffffff813ab9e8>] iov_iter_zero+0x68/0x250
[  141.055346]  [<ffffffff814866a8>] read_iter_zero+0x38/0xb0
[  141.056854]  [<ffffffff811c0994>] __vfs_read+0xc4/0xf0
[  141.058295]  [<ffffffff811c154a>] vfs_read+0x7a/0x120
[  141.059711]  [<ffffffff811c1df3>] SyS_read+0x53/0xd0
[  141.061104]  [<ffffffff81701772>] entry_SYSCALL_64_fastpath+0x12/0x76
[  141.062768] a.out           x ffff88007b92fca0     0  9574   9573 0x00000084
[  141.064604]  ffff88007b92fca0 ffff880076cac2c0 ffff88007a862c80 ffff88007b930000
[  141.066555]  ffff88007a863040 ffff88007a863308 ffff88007a862c80 ffff88007cc10000
[  141.068492]  ffff88007b92fcb8 ffffffff816fc1a7 ffff88007a863308 ffff88007b92fd28
[  141.070437] Call Trace:
[  141.071389]  [<ffffffff816fc1a7>] schedule+0x37/0x90
[  141.072788]  [<ffffffff810733fe>] do_exit+0x6be/0xb50
[  141.074198]  [<ffffffff81073917>] do_group_exit+0x47/0xc0
[  141.075676]  [<ffffffff8107f122>] get_signal+0x222/0x7e0
[  141.077135]  [<ffffffff8100f232>] do_signal+0x32/0x6d0
[  141.078570]  [<ffffffff81095cc8>] ? finish_task_switch+0xa8/0x2b0
[  141.080176]  [<ffffffff8106b967>] ? syscall_slow_exit_work+0x4b/0x10d
[  141.081837]  [<ffffffff81095cc8>] ? finish_task_switch+0xa8/0x2b0
[  141.083441]  [<ffffffff8106b8ba>] ? exit_to_usermode_loop+0x2e/0x90
[  141.085063]  [<ffffffff8106b8d8>] exit_to_usermode_loop+0x4c/0x90
[  141.086667]  [<ffffffff8100355b>] syscall_return_slowpath+0xbb/0x130
[  141.088305]  [<ffffffff817018da>] int_ret_from_sys_call+0x25/0x9f
[  141.089896] a.out           D ffff88007be2fab8     0  9575   9573 0x00100084
[  141.091734]  ffff88007be2fab8 ffff880036509640 ffff8800366742c0 ffff88007be30000
[  141.093688]  0000000000000000 7fffffffffffffff ffff88007ff72cb8 ffffffff816fca00
[  141.095743]  ffff88007be2fad0 ffffffff816fc1a7 ffff88007fc17280 ffff88007be2fb70
[  141.097699] Call Trace:
[  141.098649]  [<ffffffff816fca00>] ? bit_wait+0x60/0x60
[  141.100071]  [<ffffffff816fc1a7>] schedule+0x37/0x90
[  141.101453]  [<ffffffff817005c8>] schedule_timeout+0x178/0x1c0
[  141.103001]  [<ffffffff810e81e2>] ? ktime_get+0x102/0x130
[  141.104468]  [<ffffffff810bdfd9>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[  141.106158]  [<ffffffff810be0ad>] ? trace_hardirqs_on+0xd/0x10
[  141.107698]  [<ffffffff810e8187>] ? ktime_get+0xa7/0x130
[  141.109138]  [<ffffffff811276ea>] ? __delayacct_blkio_start+0x1a/0x30
[  141.110782]  [<ffffffff816fb641>] io_schedule_timeout+0xa1/0x110
[  141.112350]  [<ffffffff816fca16>] bit_wait_io+0x16/0x70
[  141.113774]  [<ffffffff816fc62b>] __wait_on_bit+0x5b/0x90
[  141.115234]  [<ffffffff8113f83a>] ? find_get_pages_tag+0x19a/0x2c0
[  141.116824]  [<ffffffff8113e5c6>] wait_on_page_bit+0xc6/0xf0
[  141.118319]  [<ffffffff810b5830>] ? autoremove_wake_function+0x30/0x30
[  141.119983]  [<ffffffff8113e797>] __filemap_fdatawait_range+0x107/0x190
[  141.121643]  [<ffffffff81140a8c>] ? __filemap_fdatawrite_range+0xcc/0x100
[  141.123352]  [<ffffffff8113e82f>] filemap_fdatawait_range+0xf/0x30
[  141.124955]  [<ffffffff81140bad>] filemap_write_and_wait_range+0x3d/0x60
[  141.126655]  [<ffffffff812b2614>] xfs_file_fsync+0x44/0x180
[  141.128149]  [<ffffffff811f482b>] vfs_fsync_range+0x3b/0xb0
[  141.129646]  [<ffffffff812b4242>] xfs_file_write_iter+0x102/0x140
[  141.131260]  [<ffffffff811c0a87>] __vfs_write+0xc7/0x100
[  141.132702]  [<ffffffff811c168d>] vfs_write+0x9d/0x190
[  141.134108]  [<ffffffff811e104a>] ? __fget_light+0x6a/0x90
[  141.135593]  [<ffffffff811c1ec3>] SyS_write+0x53/0xd0
[  141.136998]  [<ffffffff81701772>] entry_SYSCALL_64_fastpath+0x12/0x76
[  141.138646] a.out           D ffff88007af4fce8     0  9576   9573 0x00000084
[  141.140490]  ffff88007af4fce8 ffff8800366742c0 ffff880036672c80 ffff88007af50000
[  141.142415]  ffff88007d14a5b0 ffff880036672c80 0000000000000246 00000000ffffffff
[  141.144331]  ffff88007af4fd00 ffffffff816fc1a7 ffff88007d14a5a8 ffff88007af4fd10
[  141.146308] Call Trace:
[  141.147261]  [<ffffffff816fc1a7>] schedule+0x37/0x90
[  141.148651]  [<ffffffff816fc4d0>] schedule_preempt_disabled+0x10/0x20
[  141.150326]  [<ffffffff816fd31b>] mutex_lock_nested+0x17b/0x3e0
[  141.151902]  [<ffffffff812b3faf>] ? xfs_file_buffered_aio_write+0x5f/0x1f0
[  141.153647]  [<ffffffff812b3faf>] xfs_file_buffered_aio_write+0x5f/0x1f0
[  141.155397]  [<ffffffff812b41c4>] xfs_file_write_iter+0x84/0x140
[  141.156989]  [<ffffffff811c0a87>] __vfs_write+0xc7/0x100
[  141.158460]  [<ffffffff811c168d>] vfs_write+0x9d/0x190
[  141.159933]  [<ffffffff811e104a>] ? __fget_light+0x6a/0x90
[  141.161417]  [<ffffffff811c1ec3>] SyS_write+0x53/0xd0
[  141.162853]  [<ffffffff81701772>] entry_SYSCALL_64_fastpath+0x12/0x76
(...snipped...)
[  181.154922] [ 9573]  1000  9573   541717   402522     797       6        0             0 a.out
[  181.157145] [ 9575]  1000  9574     1078        0       7       3        0             0 a.out
[  181.159265] Out of memory: Kill process 9573 (a.out) score 908 or sacrifice child
[  181.161160] Killed process 9575 (a.out) total-vm:4312kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  184.227075] sysrq: SysRq : Kill All Tasks
----------
using linux-next-20160112 without "mm,oom: exclude TIF_MEMDIE processes from
candidates." patch, and reproducer shown below.
----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sched.h>

static int file_writer(void *unused)
{
	static char buffer[4096] = { }; 
	const int fd = open("/tmp/file",
			    O_WRONLY | O_CREAT | O_APPEND | O_SYNC, 0600);
	while (write(fd, buffer, sizeof(buffer)) == sizeof(buffer));
	return 0;
}

static int memory_consumer(void *unused)
{
	const int fd = open("/dev/zero", O_RDONLY);
	unsigned long size;
	char *buf = NULL;
	sleep(1);
	unlink("/tmp/file");
	for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
		char *cp = realloc(buf, size);
		if (!cp) {
			size >>= 1;
			break;
		}
		buf = cp;
	}
	read(fd, buf, size); /* Will cause OOM due to overcommit */
	return 0;
}

int main(int argc, char *argv[])
{
	if (fork() == 0) {
		int i;
		for (i = 0; i < 10; i++) {
			char *cp = malloc(4096);
			if (!cp || clone(file_writer, cp + 4096,
					 CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL) == -1)
				break;
		}
	} else {
		memory_consumer(NULL);
	}
	while (1)
		pause();
}
----------

>  
> > How can we guarantee that find_lock_task_mm() from oom_kill_process()
> > chooses !TIF_MEMDIE thread when try_to_sacrifice_child() somehow chose
> > !TIF_MEMDIE thread? I think choosing !TIF_MEMDIE thread at
> > find_lock_task_mm() is the simplest way.
> 
> find_lock_task_mm chosing TIF_MEMDIE thread shouldn't change anything
> because the whole thread group will go down anyway. If you want to
> guarantee that the sysrq+f never choses a task which has a TIF_MEMDIE
> thread then we would have to check for fatal_signal_pending as well
> AFAIU. Fiddling with find find_lock_task_mm will not help you though
> unless I am missing something.

I do want to guarantee that the SysRq-f (and timeout based next victim
selection) never chooses a process which has a TIF_MEMDIE thread.

I don't like current "oom: clear TIF_MEMDIE after oom_reaper managed to unmap
the address space" patch unless both "mm,oom: exclude TIF_MEMDIE processes from
candidates." patch and "mm,oom: Re-enable OOM killer using timers." patch are
used together. Since your patch covers only likely case, your patch cannot become
alternative to my patches which cover unlikely cases.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]