Re: [patch -mm 4/9 v2] oom: remove compulsory panic_on_oom mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

On Wed, 17 Feb 2010 11:34:30 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Tue, 16 Feb 2010 18:28:05 -0800 (PST)
> David Rientjes <rientjes@xxxxxxxxxx> wrote:
> 
> > On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote:
> > 
> > > > What do you think about making pagefaults use out_of_memory() directly and 
> > > > respecting the sysctl_panic_on_oom settings?
> > > > 
> > > 
> > > I don't think this patch is good. Because several memcg can
> > > cause oom at the same time independently, system-wide oom locking is
> > > unsuitable. BTW, what I doubt is much more fundamental thing.
> > > 
> > 
> > We want to lock all populated zones with ZONE_OOM_LOCKED to avoid 
> > needlessly killing more than one task regardless of how many memcgs are 
> > oom.
> > 
> Current implentation archive what memcg want. Why remove and destroy memcg ?
> 
It might be a bit off-topic, but memcg's check for last_oom_jiffies seems
not to work well under heavy load, and pagefault_out_of_memory() causes
global oom.

Step.1 make a memory cgroup directory and sed memory.limit_in_bytes to a small value

  > mkdir /cgroup/memory/test
  > echo 1M >/cgroup/memory/test/memory.limit_in_bytes

Stem.2 run attached test program(which allocates memory and does fork recursively)

  > ./recursive_fork -c 8 -s `expr 1 \* 1024 \* 1024`

This causes not only memcg's oom, but also global oom(My machine has 8 CPUS).

===
[348090.121808] recursive_fork3 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[348090.121821] recursive_fork3 cpuset=/ mems_allowed=0
[348090.121829] Pid: 22744, comm: recursive_fork3 Not tainted 2.6.32.8-00001-gb6cd517 #3
[348090.121832] Call Trace:
[348090.121849]  [<ffffffff810d6015>] oom_kill_process+0x86/0x295
[348090.121855]  [<ffffffff810d64cf>] ? select_bad_process+0x63/0xf0
[348090.121861]  [<ffffffff810d687a>] mem_cgroup_out_of_memory+0x69/0x87
[348090.121870]  [<ffffffff811119c2>] __mem_cgroup_try_charge+0x15f/0x1d4
[348090.121876]  [<ffffffff811126bc>] mem_cgroup_try_charge_swapin+0x104/0x159
[348090.121885]  [<ffffffff810edd9b>] handle_mm_fault+0x4ca/0x76c
[348090.121895]  [<ffffffff8143419f>] ? do_page_fault+0x141/0x2da
[348090.121904]  [<ffffffff81087286>] ? trace_hardirqs_on+0xd/0xf
[348090.121910]  [<ffffffff8143419f>] ? do_page_fault+0x141/0x2da
[348090.121915]  [<ffffffff8143431c>] do_page_fault+0x2be/0x2da
[348090.121922]  [<ffffffff81432115>] page_fault+0x25/0x30
[348090.121929] Task in /test killed as a result of limit of /test
[348090.121936] memory: usage 1024kB, limit 1024kB, failcnt 279335
[348090.121940] memory+swap: usage 4260kB, limit 9007199254740991kB, failcnt 0
[348090.121943] Mem-Info:
[348090.121947] Node 0 DMA per-cpu:
[348090.121952] CPU    0: hi:    0, btch:   1 usd:   0
[348090.121956] CPU    1: hi:    0, btch:   1 usd:   0
[348090.121960] CPU    2: hi:    0, btch:   1 usd:   0
[348090.121963] CPU    3: hi:    0, btch:   1 usd:   0
[348090.121967] CPU    4: hi:    0, btch:   1 usd:   0
[348090.121970] CPU    5: hi:    0, btch:   1 usd:   0
[348090.121974] CPU    6: hi:    0, btch:   1 usd:   0
[348090.121977] CPU    7: hi:    0, btch:   1 usd:   0
[348090.121980] Node 0 DMA32 per-cpu:
[348090.121984] CPU    0: hi:  186, btch:  31 usd:  19
[348090.121988] CPU    1: hi:  186, btch:  31 usd:  11
[348090.121992] CPU    2: hi:  186, btch:  31 usd: 178
[348090.121995] CPU    3: hi:  186, btch:  31 usd:   0
[348090.121999] CPU    4: hi:  186, btch:  31 usd: 182
[348090.122002] CPU    5: hi:  186, btch:  31 usd:  29
[348090.122006] CPU    6: hi:  186, btch:  31 usd:   0
[348090.122009] CPU    7: hi:  186, btch:  31 usd:   0
[348090.122012] Node 0 Normal per-cpu:
[348090.122016] CPU    0: hi:  186, btch:  31 usd:  54
[348090.122020] CPU    1: hi:  186, btch:  31 usd: 109
[348090.122023] CPU    2: hi:  186, btch:  31 usd: 149
[348090.122027] CPU    3: hi:  186, btch:  31 usd: 119
[348090.122030] CPU    4: hi:  186, btch:  31 usd: 123
[348090.122033] CPU    5: hi:  186, btch:  31 usd: 145
[348090.122037] CPU    6: hi:  186, btch:  31 usd:  54
[348090.122041] CPU    7: hi:  186, btch:  31 usd:  95
[348090.122049] active_anon:5354 inactive_anon:805 isolated_anon:0
[348090.122051]  active_file:18317 inactive_file:57785 isolated_file:0
[348090.122053]  unevictable:0 dirty:0 writeback:211 unstable:0
[348090.122054]  free:3324478 slab_reclaimable:18860 slab_unreclaimable:13472
[348090.122056]  mapped:4315 shmem:63 pagetables:1098 bounce:0
[348090.122059] Node 0 DMA free:15676kB min:12kB low:12kB high:16kB active_anon:0kB inacti
ve_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(
file):0kB present:15100kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_re
claimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:
0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[348090.122076] lowmem_reserve[]: 0 3204 13932 13932
[348090.122083] Node 0 DMA32 free:2773244kB min:3472kB low:4340kB high:5208kB active_anon:
0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
 isolated(file):0kB present:3281248kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem
:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:
0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[348090.122100] lowmem_reserve[]: 0 0 10728 10728
[348090.122108] Node 0 Normal free:10508992kB min:11624kB low:14528kB high:17436kB active_
anon:21416kB inactive_anon:3220kB active_file:73268kB inactive_file:231140kB unevictable:0
kB isolated(anon):0kB isolated(file):0kB present:10985984kB mlocked:0kB dirty:0kB writebac
k:844kB mapped:17260kB shmem:252kB slab_reclaimable:75440kB slab_unreclaimable:53872kB ker
nel_stack:1224kB pagetables:4392kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned
:0 all_unreclaimable? no
[348090.122125] lowmem_reserve[]: 0 0 0 0
[348090.122788] Node 0 DMA: 1*4kB 1*8kB 3*16kB 2*32kB 3*64kB 2*128kB 1*256kB 1*512kB 2*102
4kB 2*2048kB 2*4096kB = 15676kB
[348090.122853] Node 0 DMA32: 11*4kB 6*8kB 2*16kB 4*32kB 6*64kB 13*128kB 4*256kB 6*512kB 6
*1024kB 4*2048kB 672*4096kB = 2773244kB
[348090.122915] Node 0 Normal: 188*4kB 128*8kB 214*16kB 409*32kB 107*64kB 18*128kB 4*256kB
 1*512kB 2*1024kB 0*2048kB 2558*4096kB = 10508592kB
[348090.122936] 76936 total pagecache pages
[348090.122940] 816 pages in swap cache
[348090.122943] Swap cache stats: add 7851711, delete 7850894, find 3676243/4307445
[348090.122946] Free swap  = 1995492kB
[348090.122949] Total swap = 2000888kB
[348090.300467] 3670016 pages RAM
[348090.300471] 153596 pages reserved
[348090.300474] 38486 pages shared
[348090.300476] 162081 pages non-shared
[348090.300482] Memory cgroup out of memory: kill process 22072 (recursive_fork3) score 12
48 or a child
[348090.300486] Killed process 22072 (recursive_fork3)
[348090.300524] Kernel panic - not syncing: out of memory from page fault. panic_on_oom is
 selected.
[348090.300526]
[348090.311038] Pid: 22744, comm: recursive_fork3 Not tainted 2.6.32.8-00001-gb6cd517 #3
[348090.311050] Call Trace:
[348090.311073]  [<ffffffff8142efa4>] panic+0x75/0x133
[348090.311090]  [<ffffffff810d67d2>] pagefault_out_of_memory+0x50/0x8f
[348090.311104]  [<ffffffff81036a2d>] mm_fault_error+0x37/0xba
[348090.311117]  [<ffffffff8143428d>] do_page_fault+0x22f/0x2da
[348090.311130]  [<ffffffff81432115>] page_fault+0x25/0x30
===

I take a kdump by enabling panic_on_oom, and compared the last_oom_jiffies and jiffies.

crash> struct mem_cgroup.last_oom_jiffies 0xffffc90013514000
  last_oom_jiffies = 4642757419,
crash> p jiffies
jiffies = $10 = 4642757607

I agree this is a extreme example, but this is not a desirable behavior.
Changing "HZ/10" in mem_cgroup_last_oom_called() to "HZ/2" or some would fix
this case, but it's not a essential fix.

Any thoughts?


Regards,
Daisuke Nishimura.

#define _GNU_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <fcntl.h>
#include <libgen.h>
#include <errno.h>

void
recursive_fork(size_t size)
{
	pid_t pid;

	while ((pid = fork() == 0))
	{
		if (size) {
			void *buf;
			buf = malloc(size);
			if (!buf) {
				perror("malloc error");
				exit(-errno);
			}
			memset(buf, 0, size);
			free(buf);
		}
	}
	if (pid < 0) {
		perror("fork error(child)");
		exit(-errno);
	}
	exit(0);
}

void usage(const char *cmd)
{
	fprintf(stderr, "Usage: %s [-c <num child>] [-s <size in bytes>] [-p <cgroup path>]\n", cmd);
	exit(-1);
}

int
main(int argc, char *argv[])
{
	pid_t pid;
	int opt;
	unsigned int numchild = 1;
	size_t alloc_size = 0;
	char path[64];
	int fd;

	while ((opt = getopt(argc, argv, "c:s:p:")) != -1) {
		switch (opt) {
		case 'c':
			numchild = atoi(optarg);
			break;
		case 's':
			alloc_size = atoi(optarg);
			break;
		case 'p':
			snprintf(path, sizeof(path), "%s/tasks", optarg);
			fd = open(path, O_WRONLY);
			if (fd < 0) {
				perror("open error");
				exit(-errno);
			}
			write(fd, "\0", 1);
			close(fd);
			break;
		default:
			usage(basename(argv[0]));
		}
	}

	while (numchild--) {
		pid = fork();
		if (pid < 0) {
			perror("fork error(parent)");
			exit(-errno);
		}
		if (pid == 0)	/* child */
			recursive_fork(alloc_size);
	}

	return 0;
}



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]